Lecture 9 Polar Codinghomepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/... · When Arıkan introduced polar codes in 2007, he focus on achieving capacity for the general binary-input

Polarization

Lecture 9Polar Coding

I-Hsiang Wang

Department of Electrical EngineeringNational Taiwan University

[email protected]

December 29, 2015

1 / 25 I-Hsiang Wang IT Lecture 9

[email protected]

Polarization

In Pursuit of Shannon’s Limit

Since 1948, Shannon’s theory has drawn the sharp boundary between thepossible and the impossible in data compression and data transmission.

Once fundamental limits are characterized, the next natural question is:How to achieve these limits with acceptable complexity?

For source coding, soon after Shannon’s 1948 paper, information andcoding theorists found optimal compression schemes with low complexity:

Huffman Code (1952): optimal for memoryless sourceLempel-Ziv (1977): optimal for stationary ergodic source

On the other hand, for channel coding, it turns out be a much harderproblem. It has been the holy grail for coding theorist to find a codingscheme that achieves Shannon’s limit with low complexity.


Polarization

In Pursuit of Capacity-Achieving Codes

Two barriers in pursuing a low-complexity capacity-achieving codes:1 Lack of explicit construction. In Shannon’s proof, it is only

proved that there exists coding schemes that achieve capacity.2 Lack of structure to reduce complexity. In the proof of coding

theorems, complexity issues are often neglected, while codes withstructures are hard to prove to achieve capacity.

Since 1990’s, there are several practical codes found to approach capacity,including turbo code, low-density parity-check (LDPC) code, etc.These codes perform very well empirically, but still in lack of theoreticalinvestigation on the performances and even proof of optimality.The first provably capacity-achieving coding scheme with acceptablecomplexity is polar code, introduced by Erdal Arıkan in 2007.Later in 2012, spatially coupled LDPC codes were also shown to achievecapacity (Shrinivas Kudekar, Tom Richardson, and Rüediger Urbanke).


Polarization

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009 3051

Channel Polarization: A Method for ConstructingCapacity-Achieving Codes for Symmetric

Binary-Input Memoryless ChannelsErdal Arıkan, Senior Member, IEEE

Abstract—A method is proposed, called channel polarization,to construct code sequences that achieve the symmetric capacity

of any given binary-input discrete memoryless channel(B-DMC) . The symmetric capacity is the highest rate achiev-able subject to using the input letters of the channel with equalprobability. Channel polarization refers to the fact that it is pos-sible to synthesize, out of independent copies of a given B-DMC

, a second set of binary-input channelssuch that, as becomes large, the fraction of indices for which

is near approaches and the fraction for whichis near approaches . The polarized channelsare well-conditioned for channel coding: one need only

send data at rate through those with capacity near and at ratethrough the remaining. Codes constructed on the basis of this ideaare called polar codes. The paper proves that, given any B-DMC

with and any target rate , there exists asequence of polar codes such that has block-length

, rate , and probability of block error under suc-cessive cancellation decoding bounded asindependently of the code rate. This performance is achievable byencoders and decoders with complexity for each.

Index Terms—Capacity-achieving codes, channel capacity,channel polarization, Plotkin construction, polar codes, Reed–Muller (RM) codes, successive cancellation decoding.

I. INTRODUCTION AND OVERVIEW

A FASCINATING aspect of Shannon’s proof of the noisychannel coding theorem is the random-coding method

that he used to show the existence of capacity-achieving codesequences without exhibiting any specific such sequence [1].Explicit construction of provably capacity-achieving codesequences with low encoding and decoding complexities hassince then been an elusive goal. This paper is an attempt tomeet this goal for the class of binary-input discrete memorylesschannels (B-DMCs).

We will give a description of the main ideas and results of thepaper in this section. First, we give some definitions and statesome basic facts that are used throughout the paper.

Manuscript received October 14, 2007; revised August 13, 2008. Current ver-sion published June 24, 2009. This work was supported in part by The Scien-tific and Technological Research Council of Turkey (TÜBITAK) under Project107E216 and in part by the European Commission FP7 Network of ExcellenceNEWCOM++ under Contract 216715. The material in this paper was presentedin part at the IEEE International Symposium on Information Theory (ISIT),Toronto, ON, Canada, July 2008.

The author is with the Department of Electrical-Electronics Engineering,Bilkent University, Ankara, 06800, Turkey (e-mail: [email protected]).

Communicated by Y. Steinberg, Associate Editor for Shannon Theory.Color versions of Figures 4 and 7 in this paper are available online at http://

ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIT.2009.2021379

A. Preliminaries

We write to denote a generic B-DMC withinput alphabet , output alphabet , and transition probabilities

. The input alphabet will always be, the output alphabet and the transition probabilities may

be arbitrary. We write to denote the channel correspondingto uses of ; thus, with

.Given a B-DMC , there are two channel parameters of pri-

mary interest in this paper: the symmetric capacity

and the Bhattacharyya parameter

These parameters are used as measures of rate and reliability,respectively. is the highest rate at which reliable commu-nication is possible across using the inputs of with equalfrequency. is an upper bound on the probability of max-imum-likelihood (ML) decision error when is used only onceto transmit a or .

It is easy to see that takes values in . Throughout,we will use base- logarithms; hence, will also takevalues in . The unit for code rates and channel capacitieswill be bits.

Intuitively, one would expect that iff ,and iff . The following bounds, proved inthe Appendix, make this precise.

Proposition 1: For any B-DMC , we have

(1)

(2)

The symmetric capacity equals the Shannon capacitywhen is a symmetric channel, i.e., a channel for which thereexists a permutation of the output alphabet such that i)

and ii) for all . The bi-nary symmetric channel (BSC) and the binary erasure channel(BEC) are examples of symmetric channels. A BSC is a B-DMC

with and. A B-DMC is called a BEC if for each , either

or . In the latter case,

0018-9448/$25.00 © 2009 IEEE

The paper wins the 2010 Information Theory Society Best Paper Award.


Polarization

Overview

When Arıkan introduced polar codes in 2007, he focus on achievingcapacity for the general binary-input memoryless symmetric channels(BMS), including BSC, BEC, etc.Later, polar codes are shown to be optimal in many other settings,including lossy source coding, non-binary-input channels, multiple accesschannels, source coding with side information (Wyner-Ziv problem), etc.Instead of giving a comprehensive introduction, we shall introducechannel polarization and polar coding for BMS, in the following order:

1 First we introduce the concept of channel polarization.2 Then we explore polar coding.


Polarization

Notations

Recall in channel coding, we use the DMC N times with N being theblocklength of the coding scheme.Since the channel is the main focus, we shall use the following notationsthroughout this lecture:

W to denote the channel pY|X

P to denote the input distribution pX

I (P,W ) to denote I (X ;Y ).

Beside, since we focus on BMS channels, and it is not difficult to provethat X ∼ Ber

(12

)achieves the channel capacity of any BMS, we shall use

I (W ) (abuse of notation) to denote I (P,W ) when the input P is Ber(12

).

In other words, the channel capacity of the BMS channel W is I (W ).


Polarization Basic Channel TransformationChannel Polarization

1 PolarizationBasic Channel TransformationChannel Polarization



Single Usage of Channel W

X YW

N Usage of Channel W

...

ENC DEC

W

W

W

M M

X1

X2

XN

Y1

Y2

YN



Arıkan’s Idea

...

Pre-Processing

W

W

W

X1

X2

XN

Y1

Y2

YNUN

U2

U1

Post-Processing

V1

V2

VN

Apply special transforms to both input and output



Arıkan’s Idea

W1

...

W2

WNUN

U2

U1 V1

V2

VN



Arıkan’s Idea

W1

...

W2

WNUN

U2

U1 V1

V2

VN

Roughly N I (W ) channels with capacity � 1



Arıkan’s Idea

W1

...

W2

WNUN

U2

U1 V1

V2

VN

Roughly N I (W ) channels with capacity � 1

Roughly N (1 � I (W )) channels with capacity � 0

Equivalently some perfect channels and some useless channels −→ Polarization

Coding becomes extremely simple: simply use those perfect channels foruncoded transmission, and throw those useless channels away.






Arıkan’s Basic Channel Transformation

Consider two channel uses of W:


X1

X2

W

W

Y1

Y2


Arıkan’s Basic Channel Transformation

Consider two channel uses of W:Apply the pre-processor:X1 = U1 ⊕ U2, X2 = U2,where U1 ⊥⊥ U2, U1,U2 ∼ Ber

(12

).

We now have two synthetic channels induced by the above procedure:

W− : U1 → V1 ≜ (Y1,Y2)

W+ : U2 → V2 ≜ (Y1,Y2,U1)

The above transform yields the following two crucial phenomenon:I (W− ) ≤ I (W ) ≤ I (W+ ) (Polarization)I (W− ) + I (W+ ) = 2I (W ) (Conservation of Information)


W

W

Y1

Y2U2

U1


Example: Binary Erasure Channel

Example 1Let W be a BEC with erasure probability ε ∈ (0, 1), and I (W ) = 1− ε.Find the values of I (W− ) and I (W+ ), and verify the above properties.

sol: Intuitively W− is worse than W and W+ is better than W:For W−, input is U1, output is (Y1,Y2).Only when both Y1 and Y2 are not erased, one can figure out U1!=⇒ W− is BEC with erasure probability 1− (1− ε)

2= 2ε− ε2.

For W+, input is U2, output is (Y1,Y2,U1).As long as one of Y1 and Y2 are not erased, one can figure out U2!=⇒ W+ is BEC with erasure probability ε2.

Hence, I (W− ) = 1− 2ε+ ε2 and I (W+ ) = 1− ε2.



Example: Binary Symmetric Channel

Example 2Let W be a BSC with crossover probability p ∈ (0, 1), andI (W ) = 1− Hb (p). Find the values of I (W− ) and I (W+ ).



Basic Properties

Theorem 1For any BMS channel W and the induced channels {W−,W+} fromArıkan’s basic transformation, we have

I (W− ) ≤ I (W ) ≤ I (W+ ) with equality iff I (W ) = 0 or 1.I (W− ) + I (W+ ) = 2I (W )

pf: We prove the conservation of information first:

I(W− )

+ I(W+

)= I (U1 ;Y1,Y2 ) + I (U2 ;Y1,Y2,U1 )

= I (U1 ;Y1,Y2 ) + I (U2 ;Y1,Y2 |U1 ) = I (U1,U2 ;Y1,Y2 )

= I (X1,X2 ;Y1,Y2 ) = I (X1 ;Y1 ) + I (X2 ;Y2 ) = 2I (W ) .

I (W+ ) = I (X2 ;Y1,Y2,U1 ) ≥ I (X2 ;Y2 ) = I (W ), and hence the firstproperty holds. (Proof of the condition for equality is left as exercise.)



Extremal Channels

12.1. The Basic Channel Transformation 281

The clue is now to realize that we have equality in (12.27) if, and onlyif, Y1 is conditionally independent of X1 given U1 and Y2. This can happenonly in exactly two cases: Either W is useless, i.e., Y1 is independent of X1

and any other quantity related with X1 such that all conditioning disappearsand we have H(Y1)�H(Y1) in (12.25) (this corresponds to the situation whenI(W) = 0). Or W is perfect so that from Y2 we can perfectly recover U2 and— with the additional help of U1 — also X1 (this corresponds to the situationwhen I(W) = 1 bit).

It can be shown that a BEC yields the largest di↵erence between I(W+)and I(W�) and the BSC yields the smallest di↵erence. Any other DMC willyield something in between. See Figure 12.4 for the corresponding plot.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

I(W

+)−I(W

−

)[bits]

I(W) [bits]

BEC

BSC

Figure 12.4: Di↵erence of I(W+) and I(W�) as a function of I(W). We see thatunless the channel is extreme already, the di↵erence is strictlypositive. Moreover, the largest di↵erence is achieved for a BEC,while the BSC yields the smallest di↵erence.

Exercise 12.6. In this exercise you are asked to recreate the boundary curvesin Figure 12.4.

1. Start with W being a BEC with erasure probability �. Recall that I(W) =1� � bits, and then show that I(W+)� I(W�) = 2�(1� �) bits.

2. For W being a BSC with crossover probability ✏, recall that I(W) =1�Hb(✏) bits. Then, defining Z1 and Z2 being independent binary RVs

c� Copyright Stefan M. Moser, version 4.4, 31 Aug. 2015

(Taken from Chap. 12.1 of Moser[4].)

If we plot the “information stretch”I (W+ )− I (W− ) versus the originalinformation I (W ), it can be shownthat among all BMS channels:

BEC maximizes the stretchBSC minimizes the stretch

Lower boundary:

2Hb (2p(1− p))− 2Hb (p) ,

where p = Hb−1 (1− I (W )).

Upper boundary:

2I (W ) (1− I (W )) .






Recursive Application of Arıkan’s Transformation

Duplicate W, apply the transformation, and get W− and W+.


W

W




Duplicate W− (and W+).


W

W

W

W




Duplicate W− (and W+).Apply the transformation on W−,and get W−− and W−+.


W

W

W

W




Duplicate W− (and W+).Apply the transformation on W−,and get W−− and W−+.Apply the transformation on W+,and get W+− and W++.


W

W

W

W




Duplicate W− (and W+).Apply the transformation on W−,and get W−− and W−+.Apply the transformation on W+,and get W+− and W++.

...We can keep going and going,until the desired blocklengthis reached.


W

W

W

W

W

W

W

W

Lecture 9 Polar Codinghomepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/... · When Arıkan introduced polar codes in 2007, he focus on achieving capacity for the general binary-input

Documents