Fixed Binning Schemes for Channel and Source Coding ...pramodv/pubs/wang_duality.pdfchannel and source coding problems in Figure 1 (with the encoding and decoding operations reversed

Fixed Binning Schemes for Channel and Source CodingProblems: An Operational Duality

Hua Wang and Pramod Viswanath∗

September 30, 2003

Abstract

We explore the connection between channel and source coding problems. Our studycenters on the class of binning schemes that characterize the largest known achievablerates for channel and source coding problems with partial side information and mul-titerminal scenarios. The binning scheme can be viewed as a set of bins with eachbin forming a channel code (with error probability ε2) and set of all the codewords inthe bins forms an overall channel code (with error probability ε1). Our main result isa characterization of the performance of deterministic binning schemes for the chan-nel and source coding problems solely as a function of the error probabilities ε1, ε2.This characterization shows the role of the two error probabilities to be reversed forthe channel and source coding problems and suggests an operational duality betweenthe two problems with respect to binning schemes. The algorithmic implications onconstructing binning schemes are also discussed.

1 Introduction

The study of fundamental limits to data communication and compression is informationtheory. Shannon characterized these limits in the most basic setting and pointed out a “cu-rious and provocative duality” between the problems of data compression with a distortionmeasure and communication over a noisy channel (in [14]).

The noisy channel with transmission alphabet X and receive alphabet Y is representedby the conditional probability measure PY |X . For a fixed distribution on the transmissionalphabet (say, PX) that meets on average any constraint on the transmitted signal, a randomcoding argument shows that a communication rate R < I (X; Y ) can be achieved. Thedata compression problem consists of a source (with alphabet Y and probability measure

∗The authors are with the Electrical and Computer Engineering and the Coordinated Science Laboratoryat the University of Illinois at Urbana-Champaign, Urbana IL 61801; email: {huawang,pramodv}@uiuc.edu.This research was sponsored in part by NSF ITR 00-85929 and NSF CCR-0325924.

1

PY ) which has to be compressed (to bits) and reconstructed (in an alphabet X ) so thatthe average distortion under a given distortion measure satisfies some distortion constraint.With a conditional distribution PX|Y under which the average distortion meets the distortionconstraint, a random coding argument shows that a compression rate R > I (X; Y ) can beachieved.

The appearance of I (X; Y ) in the rate expression for both the problems shows a “formulalevel duality”. Furthermore, from the similar random coding proof of these two problems, [11]points out a “random coding level duality”. This is in the sense that the random code bookin both the problems is constructed using the same marginal distribution PX and encodingin the channel coding problem is the same as decoding in the source coding problem andvice versa.

Fix a joint probability distribution PXY on two random variables X and Y and considera pair of channel coding and source coding problems as shown in Figure 1. In the channelcoding problem, the maximal probability of decoding error (denoted as Pe) is the performancemeasure for any coding scheme. The analogy in the source coding problem is the probabilityof the set of source sequences that do not satisfy the distortion constraint, which we defineas the probability of distortion violation (denoted as Pdv). When Pdv goes to zero, the givendistortion constraint is met if the distortion measure is uniformly bounded. Consider anycoding scheme with the codebook empirical distribution matching PX , has maximal errorprobability Pe ≤ ε and an extra codeword cannot be added without violating the errorprobability criterion (we refer to such a code as maximal). Then, it is known that using anysuch maximal code for the source coding problem with encoders and decoders reversed hasPdv ≤ 1− ε (Chapter 2.1 of [8], the information theory book by Csiszar and Korner). Thus,a relation exists between the performances of a fixed maximal coding strategy used with thechannel and source coding problems in Figure 1 (with the encoding and decoding operationsreversed for the two problems). We call this connection an operational duality: a maximalcoding scheme with a high Pe behaves as a source code with low Pdv.

encoder decoder

Source Coding ProblemChannel Coding Problem

channel decoderencoderY

mY m

X

E[d(X, Y )] < D

m PY |X PY

Pe ≤ 1− ε Pdv ≤ ε

X

PX

Figure 1: Classical channel coding problem and source coding problem

The notions of “formula level duality” and “random coding level duality” have been ob-served in coding problems with side information as well [4, 6, 11, 15]. Channel coding withside information known noncausally at the encoder (the Gel’fand-Pinsker problem) is solvedin [10]. In this problem, the channel is described by PX|XS, where X is the input, X is theoutput and the side information S has distribution PS. For an auxiliary random variable Uwith conditional distribution PU |S and a deterministic function h : U × S → X such that

X → (X, S) → U forms a Markov chain, a communication rate R < I(U ; X) − I(U ; S) isachievable by a random binning scheme.

2

encoder decoder

Gel’fand-Pinsker Problem

mPX|XS X

S

PS

U PX|UX

Pe ≤ (1− ε2) + (ε1)1/4

Pe ≤ ε1

U

Pdv ≤ 1− ε2

h(u, s)PU |Sm

Figure 2: Gel’fand-Pinsker problem. The channel code from U to S with Pe ≤ ε2 serves asa source code for the source S and reproduction alphabet U with Pdv ≤ 1− ε2. The overallprobability of error is Pe ≤ (1− ε2) + (ε1)

14 .

decoderencoder

Wyner-Ziv Problem

PXS

X PU |X U m

Pdv ≤ (1− ε1) + (ε2)1/4

U Xh(u, s)

S

PS|U

Pdv ≤ 1− ε1 Pe ≤ ε2

Figure 3: Wyner-Ziv problem. The channel code from U to X with Pe ≤ ε1 serves as asource code for the source X and reproduction alphabet U with Pdv ≤ 1 − ε1. The overallprobability of distortion violation is Pdv ≤ (1− ε1) + (ε2)

14 .

Source coding with side information at the decoder (the Wyner-Ziv problem) is solved in[16]. In this problem, the source X and the side information S have a joint distributionPXS, X is the reproduction alphabet and the distortion matrix is given by d(x, x). Givendistortion constraint D, for an auxiliary random variable U with conditional distributionPU |X and a deterministic function h : U × S → X such that S → X → U forms a Markovchain and E[d(x, x)] ≤ D, a compression rate R > I(U ; X) − I(U ; S) is achievable by arandom binning scheme. The “formula level duality” between these two problems can beseen via the appearance of I(U ; X)−I(U ; S) in both the problems and “random coding levelduality” follows from the similarity of the random binning achievability arguments.

Our main result is an “operational duality” (in the sense of Csiszar and Korner) for thechannel and source coding problems with partial side information. More specifically, we con-sider a class of deterministic maximal binning schemes: each bin corresponds to a maximalchannel code from U to S (with error probability ε2) and the collection of the codewordsin all the bins forms a maximal channel code from U to X (with error probability ε1). The

3

channel codewords have empirical distribution PU . This binning structure can be used inboth the side information coding problems with a common rate I(U ; X)− I(U ; S) and ourmain result is the performance characterization for the Gel’fand-Pinsker problem shown inFigure 2 and the Wyner-Ziv problem shown in Figure 3:

1. For the Gel’fand-Pinsker problem, Pe ≤ (1− ε2) + (ε1)1/4.

2. For the Wyner-Ziv problem, Pdv ≤ (1− ε1) + (ε2)1/4.

Our notion of operational duality is a contrast in the behavior of the source and channelcoding problems for a fixed maximal binning strategy: a maximal binning scheme with largeε2 and small ε1 behaves as a good Gel’fand-Pinsker code and vice versa for the Wyner-Zivproblem. Strategies other than the binning scheme are not considered here, and the signif-icance of the converse for these two problems (which shows that the binning schemes areoptimal) is also not exploited. In this sense, our result does not contribute to an understand-ing of the role of “rate loss” (when compared with full side information) in these two partialside information problems; Section 3.3 of [11] talks of “nonuniqueness” of duals viewed at therandom coding level and [17] considers the issues with the Gaussian partial side informationproblem.

Our construction of the binning scheme from the individual channel codes is greedy andprovides an alternative proof of the coding theorems for the two partial side informationproblems. Binning schemes are also used in multiterminal situations: the broadcast channeland the distributed source coding problem. A “random coding level duality” between thedeterministic broadcast channel and lossless distributed source coding is observed in [5].Our study of the partial side information problems allows us extend the operational dualitybetween a class of strategies for a broadcast channel [9], and a class of strategies for themultiterminal source coding problem [2].

We begin the paper with a careful statement of the operational duality in the classicalchannel and source coding problems. This sets the stage for our main result by making precisethe class of coding strategies considered and also by building the notation and preliminaryresults. We do this in Section 2 where we strengthen the classical results in Chapter 2.1 of [8].Our main result: a greedy deterministic construction of binning schemes and evaluating itsperformance on the source and channel coding problems is the topic of Section 3. Section 4discusses the behavior of fixed binning schemes for broadcast channels and distributed sourcecoding. We conclude with a discussion of our results: comparison of the operational dualitywe derive here with previous works and the impact on construction of practical binningschemes. We aim to convey the central ideas in the main part of the paper relegating thedetails to the appendices.

4

2 Coding for the classical channel and source coding

problems

In this section we make precise the class of coding strategies considered for correspondingchannel and source coding problems. Our goal is to study the behavior of a fixed codingstrategy for both the channel and source coding problems (with the roles of encoder anddecoder reversed). The material here is essentially a review of the results from the informa-tion theory text book of Csiszar and Korner [8], but serves two purposes. First, it sets thestage for the side information problem we consider by developing the notation and also bybuilding the intuition towards what to expect in the side information case. Second, we needstrengthened versions of the results of [8] (for our main result). and this section builds thesepreliminary tools.

We use the standard notation of information theory text books; in particular, close ad-herence is maintained with the notation in [8]. For example, we denote the set of sequencesof type P in X n by TP , the set of P -typical sequences in X n by T[P ] or T[X] and the set ofY |X-typical or Y |X-generated (by x ∈ X n) sequences in Yn by T[Y |X](x).

2.1 The Setting

The random coding scheme for the basic channel coding and source coding problems isstandard material in information theory textbooks. A “random coding level duality” existsbetween the two problems in the sense that the random code book is constructed using thesame marginal distribution PX and encoding of the channel coding problem is the same asdecoding of the source coding problem and vice versa, [11]. Instead of considering the randomcodebook ensemble, we want to explore the connection between the channel coding and sourcecoding problems for a given codebook. We want to work with a set of codebooks that is easyto handle and yet represents most of the codebooks of interest. Thus we incorporate thefollowing constraints on coding scheme into the formulation of the channel coding problemand source coding problem we consider.

For a given joint probability distribution PXY on two random variables X and Y , considera pair of channel coding problem and source coding problem as shown in Figure 1. In thissection, all the conditional and marginal distributions are derived from the joint probabilitydistribution PXY . The channel coding problem is as follows:

• The channel input alphabet is X and output alphabet is Y , the channel is describedvia the conditional distribution PY |X .

• The marginal distribution PX on the input alphabet meets on average a desired con-straint on the transmitted signal.

• The empirical distribution of the codewords are close to PX . More precisely, theempirical distribution of the codewords are PX-typical.

• Given a received sequence y, the decoded codeword x is jointly typical with y.

5

Note here that we considered only those coding schemes whose codewords have a specifiedempirical distribution. A codebook that achieves the maximal mutual information for a givenchannel has been shown to have this empirical distribution property (in a certain divergencesense) [13]. Moreover, the joint typicality of the received sequence and the decoded codewordis a reasonable property to expect from a good decoder. Similarly, the source coding problemis as follows:

• The source alphabet is Y and reproduction alphabet is X , the source distribution isthe marginal PY .

• The distortion measure is a uniformly bounded function d(y, x) and the distortionconstraint is D > 0, the conditional distribution PX|Y satisfies E [d(X,Y )] < D.

• The empirical distribution of the reproduction sequences are close to PX . More pre-cisely, the empirical distribution of the reproduction sequences are PX-typical.

• Given source sequence y, the reproduction sequence x is jointly typical with y.

We are now ready to consider the behavior of a specific coding scheme (that meets therequirements for both the channel and source coding problems). We are also interested inconsidering coding schemes with the largest rate possible while keeping the maximal errorprobability for the channel no more than ε ∈ (0, 1).

2.2 A Greedy Code Construction

Consider the following code construction that greedily adds codewords (and correspondingdecision regions) until one stops due to the error probability criterion. Let us denote the blocklength by n and the encoding map (from messages to the transmit alphabet) by f : m 7→ Xn

and the decoder map (from the receive alphabet to the set of messages) by φ : Y n 7→ m.

1. Choose an arbitrary PX-typical sequence x1 as the codeword for the first message m1.The decision region A1 for m1 is T[Y |X](x1), i.e., all of the typical sequences generatedby x1. In our notation, A1 = φ−1(m1).

2. Suppose we have the ith codeword and the corresponding decision region Ai = φ−1(mi).We add the i + 1th codeword if we can find a PX-typical x such that

PY |X

(T[Y |X](x)−

i⋃j=1

Ai

∣∣x)≥ 1− ε (1)

by writing f(mi+1) = x and Ai+1 = T[Y |X](x) −i⋃

j=1

Ai. That is, the decision region

for the i + 1th codeword is all of the typical sequences generated by it except thosealready in the decision regions of previous codewords. This ensures the decision regionsof different messages are disjoint and furthermore, the maximal error probability is nomore than ε.

6

3. The code construction stops when we cannot find a x with a corresponding decisionregion large enough to meet the error probability criterion.

A1

YnX n

BA2m2

m1

Figure 4: Channel Code. The ellipse is the set of codewords, each dot is a codeword, Ai is thedecision region for message mi, and B is the union of the decision regions of the messages.

This algorithm greedily adds PX-typical sequences which satisfy Pe < ε to the codebook.We say that the code denoted by the encoder-decoder maps (f, φ) is an (n, ε) code for thediscrete memoryless channel (DMC) PY |X . We also denote the set of messages by Mf . Thekey observation of Lemma 2.1.3, [8] is that the greedy code construction provides the largestrate possible. We state this result below in a strengthened form.

Theorem 1. For every discrete memoryless channel (DMC) {X , PY |X , Y} and distributionPX on X , we can greedily construct an (n, εn)-code (f, φ) such that for every m ∈ Mf ,

(i) f(m) ∈ T[PX ], (2)

(ii) φ−1(m) ⊂ T[Y |X](f(m)), (3)

and the rate of the code is at least I(X; Y )− αn, where αn → 0 and εn → 0 as n →∞.

The proof of this refined version of Lemma 2.1.3 of [8] (the Maximal Code Lemma) is inAppendix B. The proofs also shows the dependence of αn and εn on the block length n. Thefollowing theorem shows that this is the largest rate even if the codewords were carefullypicked (rather than greedily). This is a refined version of Lemma 2.1.4 of [8] and the proofis in Appendix C.

Theorem 2. For every ε ∈ (0, 1) and block length n, any (n, ε) code for the DMC {X , PY |X , Y}such that the codewords belong to T[P ] can have rate no more than I(X; Y )+βn, where βn → 0.

2.3 Behavior on the Source Coding Problem

Let us fix an (n, ε) maximal code (denoted by (f, φ)) with the largest rate possible (con-structed greedily as above). This provides a maximal error probability of no more than ε on

7

the channel. How does this code (with the encoders and decoders reversed) do on the sourcecoding problem? This is answered in Chapter 2.2 of [8] by the statement that Pdv ≤ 1 − ε.Here we briefly go over the simple steps of the proof.

Let us denote the union of the decision regions for each of the messages in the channelcode by

B ,⋃

m∈Mf

φ−1(m) ⊂ Yn. (4)

From the construction of each decision region, the union of the decision regions must have apositive measure with respect to the marginal distribution PY . In particular, one can showthat

PnY (B) ≥ ε− γn, (5)

where γn → 0 as n →∞. Furthermore, from the code construction we have that for y ∈ B,x , f(φ(y)) we have x ∈ T[X] and y ∈ T[Y |X](x). This means that for n large enough, everyy ∈ B is reproduced with distortion less than D. Formally,

d(y, f(φ(y))) =1

n

∑a∈Y

∑

b∈XN(a, b|y,x)d(a, b) ≤ E [d(X, Y )] + γn, (6)

Thus the probability of distortion violation is no more than the measure of the complementof the set B, equal to 1− ε + γn.

2.4 Operational Duality

The development so far can be summarized as follows.

• The joint distribution PXY is the starting point of this duality. It defines a pair ofclassical channel and lossy source coding problems.

• The same maximal codebook is used in both problems, and the operation of encodingand decoding are reversed.

• Pdv in the source coding problem is the analog of Pe in channel coding problem. Amaximal (n, ε) channel codebook used in the source coding problem with encoders anddecoders swapped provides Pdv ≤ 1− ε.

The pair of channel coding and source coding problems are defined from the joint prob-ability distribution as in Section 2.1. In particular, we consider the greedily constructedcodebook which is a fixed codebook. The quantitative relationship between Pe and Pdv is theoperational duality; this can not be arrived at through the “random coding level duality”.

3 Coding with Partial Side Information

We now address the main focus of this paper: channel and source coding problems withpartial side information. Achievable rates for channel coding with transmitter side informa-tion and source coding with receiver side information are calculated through random binning

8

schemes. Our goal is to consider a class of maximal deterministic binning schemes and eval-uate the performance on the corresponding channel and source coding problems. We firstpoint out the random coding level duality and arrive at a precise definition of the class ofbinning schemes and their corresponding partial side information problems. We show thata greedy construction of the binning scheme explores the entire space of interesting binningschemes and the resulting codewords have maximal rates. Finally, we evaluate the perfor-mance of a maximal binning scheme on the channel and source coding problems arriving atthe operational duality.

3.1 The Setting

Channel coding with side information known noncausally at the encoder (the Gel’fand-Pinsker problem) is solved in [10]. In this problem, the channel is described by PX|XS(x|x, s),

where X is the input, X is the output and the side information S has distribution PS.Given an auxiliary random variable U with conditional distribution PU |S and a deterministic

function h : U × S → X such that X → (X, S) → U forms a Markov chain, a randombinning argument shows that the communication rate

I(U ; X)− I(U ; S) (7)

is achievable: Given the distribution PU |S and function h, generate enI(U ;X) i.i.d. sequencesUn according to

∏ni=1 P(ui) and bin them randomly into en(I(U ;X)−I(U ;S)) bins. Given side

information s and message m ∈ {1, 2, · · · , en(I(U ;X)−I(U ;S))}, the encoder looks in bin m for asequence u which is jointly typical with s and transmits x = h(u, s). The decoder receives xand looks for a unique u in the enI(U ;X) Un sequences such that the u is jointly typical withx. The index of the bin which contains the u is decoded as the message m.

Source coding with side information at the decoder (the Wyner-Ziv problem) is solved in[16]. In this problem, the source X and the side information S have joint distribution PXS,X is the reproduction alphabet, the distortion matrix is given by d(x, x) and the distortionconstraint is D. Given an auxiliary random variable U with conditional distribution PU |X ,

a deterministic function h : U × S → X such that S → X → U form a Markov chain andE [d(X, Y )] ≤ D, a random binning argument shows that the compression rate

I(U ; X)− I(U ; S) (8)

is achievable. The random coding strategy is as follows: given the distribution PU |X andfunction h, generate enI(U ;X) i.i.d. sequences Un according to

∏ni=1 P(ui) and distribute them

randomly into en(I(U ;X)−I(U ;S)) bins. Given the source sequence x, the encoder looks fora unique u in the enI(U ;X) Un sequences such that the u is jointly typical with x. Thesource is compressed to the index of the bin (denoted by m) that contains the u. Givenside information s and index m ∈ {1, 2, · · · , en(I(U ;X)−I(U ;S))}, the decoder looks in bin mfor a sequence u which is jointly typical with s and generates the reproduction sequencex = h(u, s).

9

Inspired by the discussion in the previous section and the random binning scheme describedabove, we want to explore the connection between a pair of channel coding and source codingproblems defined below. Consider a fixed joint probability distribution PXXSU on the random

variables X, X, S and U having the following properties.

1. The value of the conditional distribution PX|SU can only be 0 or 1 and is determined

by a deterministic function h : U × S → X .

2. X → (X, S) → U forms a Markov chain.

3. S → X → U forms a Markov chain.

Henceforth in this section all the marginal and conditional probabilities are generated fromthis joint probability distribution PXXSU and all averages are also calculated with respectto this joint distribution. The Gel’fand-Pinsker problem with constraints on the binningscheme is the following and also shown in Figure 2. The constraint on the encoding is thatthe empirical distributions be met and the decoding constraint is a natural one.

• The channel is described by PX|XS, where X is the input, X is the output and the sideinformation S has distribution PS.

• The auxiliary random variable U has conditional distribution PU |S.

• The marginal distribution PX on the input alphabet meets, on average, the givenconstraint on the transmitted signal.

• Given a message m and side information s, the encoder first uses joint typicality tofind a PU -typical sequence u, then generates the input sequence x = h(u, s).

• The decoder receives x and uses jointly typical decoding to figure out a u, and thenuses the u to decide which message is sent.

The Wyner-Ziv problem with constraints on the binning scheme is the following and alsoshown in Figure 3. The constraints are analogous to the ones imposed in the classical lossysource coding in Section 2.1.

• The source X and the side information S have joint distribution PXS, X is the repro-duction alphabet.

• The auxiliary random variable U has conditional distribution PU |X .

• Given distortion measure d(x, x) and the distortion constraint D, we have E[d(x, x)] ≤D.

• Given a source sequence x, the encoder uses jointly typicality to figure out a u, andthen uses the u to decide which index is assigned to that x.

10

• Upon receiving an index m and the side information s, the decoder first uses jointtypicality to find a PU -typical sequence u, then generates the reproduction sequencex = h(u, s).

3.2 Code Construction for Coding with Side Information

For the three random variables U , X and S, the random binning scheme for the Gel’fand-Pinsker problem shows that each bin is a source codebook for the discrete memoryless sourceS with reproduction alphabet U whereas all the Un sequences randomly generated is achannel codebook for the DMC from U to X. In the Wyner-Ziv problem, each bin is achannel codebook for the DMC from U to S and all the Un sequences randomly generatedis a source codebook for the DMS X with reproduction alphabet U . In order to explore therelationship between the Gel’fand-Pinsker and the Wyner-Ziv problems, the first step is toconstruct a deterministic version of the random binning scheme. Consider the following viewof the binning scheme.

1. We have an (n, ε1) channel code for the DMC from U to X with rate R ≈ I(U ; X),where the codewords are PU typical and decoding scheme is joint typical decoding.

2. The en(I(U ;X) codewords can be split into approximately en(I(U ;X)−I(U ;S)) bins whereeach bin is a (n, ε2) channel code for the DMC from U to S with rate R ≈ I(U ; S)each and the decoding scheme is joint typical decoding.

Since we have a way to construct source codebook from channel codebook, by changing thevalue of ε1 and ε2, the code construction above can be used for both the Gel’fand-Pinsker andthe Wyner-Ziv problems. In constructing a deterministic code, we do not have the luxuryof randomly generating codewords and randomly binning them. So, we characterize binningschemes with the structure above by generating them greedily, in much the same way we didin Section 2.2. Now there are three random variable U , X, S and two DMCs {U ,PX|U ,X}and {U ,PS|U ,S}.

1. Use the greedy algorithm to construct an (n, ε1) code(f (1), φ(1)

)for the DMC {U , PX|U , X}

and an (n, ε2) code(f (1), ψ(1)

)for the DMC {U , PS|U , S} with the same encoder map

f (1). Here we add a typical sequence u to the codebook as long as the Pe’s for boththe channels satisfy the given probability of error constraint. The construction stopswhen such a u can not be found. Thus, we have the first bin which contains a maximalpair of codes.

2. Now suppose we already have the ith bin and let E(i) denote the set of commoncodewords in the ith bin. Define the union of the decision regions for the DMC{U , PX|U , X} corresponding to the codewords in the ith bin by A(i). That is,

A(i) =⋃

m∈Mf(i)

φ(i)−1(m). (9)

11

Let us fix τ1 ∈ (0, ε1) and let C(i) be the set of typical u which are not possible choicesof codewords for the DMC from U to X, i.e.,

C(i) =

{u ∈ T[U ] : Pn

X|U

(i⋃

k=1

A(k)∣∣∣u

)> ε1 − τ1

}.

We now greedily construct the i + 1th bin consisting of a maximal pair(f (i+1), φ(i+1)

),

an (n, ε1) code for the DMC {U , PX|U , X} and(f (i+1), ψ(i+1)

)an (n, ε2) code for the

DMC {U , PS|U , S}. The two codes have the same encoder (the codewords in i + 1thbin) and chosen from the set T[U ]−C(i), and the decision region A(i+1) is disjoint with⋃

j≤i A(j).

3. Our construction stops when a bin cannot be created. This happens when no u inT[U ] − C(M) can be a codeword without violating the error probability criterion. Herewe have denoted M to be the last bin number.

The greedy code construction is shown in Figure 5. This construction exhausts a wide classof deterministic binning strategies. Since it is constructed greedily, it is not clear if there areenough bins, and further each bin has enough codewords. Analogous to Theorem 1, we showthat every such greedily constructed binning strategy can generate a set of bins with theappropriate rate: i.e., there are approximately exp {n (I(U ; X)− I(U ; S))} bins and eachbin has approximately exp {nI(U ; S)} codewords. With this, we have set the stage to use adeterministic binning scheme for the Gel’fand-Pinsker and Wyner-Ziv problems and evaluateits performance (as a function of the key parameters ε1 and ε2).

Pe ≤ ε1

Pe ≤ ε2

A(2)

A(1)

Xn

Sn Sn Sn

Un

Figure 5: Code construction for coding with side information. Each circle represents abin, the Un sequences in which is the common codeword set of a (n, ε1) code for the DMC{U , PX|U , X} and a (n, ε2) code for the DMC {U , PX|U , S}. The union of Un sequences inall the bins is also a codeword set of a (n, ε1) code for the DMC {U , PX|U , X}.

In the following theorem, we formalize our discussion so far; the proof is in Appendix D.

12

Theorem 3. Let random variables U,X, S have joint distribution PUSX and let the condi-tional distribution PX|U and PS|U are generated from PUSX . Let {U ,PX|U ,X} and {U ,PS|U ,S}be two DMC’s. For any ε1, ε2 ∈ (0, 1), such that I(U ; X) > I(U ; S), and any τ > 0 satisfyingI(U ; X) > I(U ; S) + 4τ , there exists a M satisfying

(I(U ; X)− I(U ; S))− 4τ ≤ 1

nlog M ≤ (I(U ; X)− I(U ; S)) + 4τ, (10)

such that after using the greedy algorithm to construct M bins, we have a family of (n, ε1)-code

(f (i), φ(i)

)for the DMC {U ,PX|U ,X} and a family of (n, ε2)-code

(f (i), ψ(i)

)for the

DMC {U ,PS|U ,S}, where i = 1, · · · , M , such that

1. for each index i(i = 1, · · · , M

), the two ith encoders in the two families of codes are

same, which is f (i) : Mf (i) → T[U ];

2. the decoders satisfy

φ(i)−1(m) ⊂ T[X|U ]

(f (i)(m)

),

ψ(i)−1(m) ⊂ T[S|U ]

(f (i)(m)

) ∀m ∈ Mf (i) , i = 1, · · · , M ,(11)

Furthermore, the decision regions of message for different codes in the codebook family(f (i), φ(i)

)for the DMC {U ,PX|U ,X} are disjoint.

3. for large enough n, the rate of each codebook satisfies

1

nlog

∣∣Mf (i)

∣∣ ≥ I(U ; S)− 2τ, i = 1, · · · , M − 1, (12)

3.3 Coding for the Gel’fand-Pinsker Problem

Our coding scheme is as follows:

• Codebook Construction. Consider a family of bins constructed as in the previous sectionwith the replacement of ε1 by a sequence ε1,n (with ε1,n → 0, as n →∞. Both senderand receiver know the code construction.

• Encoding. Given message m ∈ {1, · · · , M − 1} and side information s, the senderlooks in the mth bin for a u such that s lies in the decision region for f (m)(u), i.e.,u = f (m)(ψ(m)(s)). The transmit symbol is then x = h(u, s).

• Decoding. Receiver receives x, and looks in all the bins for a u such that u =f (m)(φ(m)(x)), i.e., x lies in the decision region for some u in the mth bin. Whensuch a u is found, the bin index m is decided to be the message.

13

From Theorem 3, we know that the rate of this code satisfies

R > I(U ; X)− I(U ; S)− 4τ (13)

for each τ > 0 satisfies I(U ; X) > I(U ; S) + 4τ . We now characterize the performance ofevery deterministic binning scheme by showing an upper bound on the error probability.This is stated formally below and the proof moved to Appendix E.

Theorem 4. The maximal probability of error of the deterministic binning scheme for theGel’fand-Pinsker problem satisfies

Pe ≤ 1− ε2 + (ε1,n)1/4 + γ, (14)

where γ can be made arbitrarily small.

While the proof in the random binning scheme is very natural, the lack of ability to averageover a random bin makes this proof somewhat more involved. In particular, the subtletyhere is that for a input sequence x, the probability measure in the output space is given bythe conditional distribution PX|XS which comes from PX|US whereas our code construction isbased on the conditional distribution PX|U . In fact, since the deterministic binning schemeis very general and allows the two codes (one from U to S and the other from U to X) to beindependently chosen, it is not a priori clear that it does well at all for the Gel’fand-Pinskerproblem. One of the important implications of our result is that it is indeed possible toconsider such binning schemes for the Gel’fand-Pinsker problem.

3.4 Coding for the Wyner-Ziv Problem

We can adapt the deterministic binning strategy for the Wyner-Ziv problem as follows.

• Codebook Construction: Use the binning scheme of the previous section with the re-placement of ε2 by a sequence a sequence ε2,n (with ε2,n → 0, as n → ∞). Both theencoder and the decoder know code construction.

• Encoding. Given x, the encoder looks in all the bins for a u such that u = f (m)(φ(m)(x)).When such a u is found, the source x is compressed to the bin index m.

• Decoding. Given index m ∈ {1, · · · , M} and side information s, the decoder looks in themth bin for a u such that u = f (m)(ψ(m)(s)). The reconstruction is then x = h(u, s).

From Theorem 3, we know that the rate of this code satisfies

R < I(U ; X)− I(U ; S) + 4τ. (15)

Note that given a x ∈ ∑Mi=1 A(i), u = f (m)(φ(m)(x)), we have (x,u) ∈ T[XU ]. For s ∈

T[S|X](x), from the Markov chain S → X → U , we have (s,x,u) ∈ T[SXU ]. As long asu = f (m)(ψ(m)(s)), we have (x, s,x,u) ∈ T[XSXU ], thus the pair (x,x) satisfies the distortionconstraint. So the probability of distortion violation is the same as the probability of decodingerror. The only thing left is to show that the probability of decoding error can be madearbitrarily small. We have the following result, proved in Appendix F.

14

Theorem 5. The probability of the distortion violation of the above binning scheme for theWyner-Ziv problem satisfies

Pdv ≤ 1− ε1 + (ε2,n)1/4 + γ, (16)

where γ can be made arbitrarily small.

A subtlety similar to the one in the Gel’fand-Pinsker problem, arises here as well. It is notclear, a priori, if binning schemes with each bin designed independently works. Perhaps eachbin which contains a code from U to S has to be carefully designed with the channel U to Xin mind. One of our conclusions is that this is not necessary. Technically, the issue here isthat given a source sequence x, the probability measure of the side information is given bythe conditional distribution PS|X = PS|UX from the Markov chain. On the other hand, ourcode construction is based only on the conditional distribution PS|U . This difference doesnot arise in the random binning proof due to the average over all the random bins.


We can summarize our discussion with an operational duality between the Gel’fand-Pinskerproblem and the Wyner-Ziv problem with respect to a deterministic fixed binning scheme.

• The joint distribution PXXSU with the required Markov properties is the starting pointof this duality. It defines a pair of Gel’fand-Pinsker and Wyner-Ziv problems.

• Consider a binning scheme with each bin containing an (n, ε2) code from U to Sand the codewords in all the bins forming an (n, ε1) code from U to X. We haveseen that a greedily constructed binning structure has number of bins approximatelyexp {n (I(U ; X)− I(U ; S))} and each bin has approximately exp {nI(U ; S)} codewords.

• Such a binning scheme allows a rate I(U ; X) − I(U ; S) to be achievable for both theside information problems.

• The error probability in the channel coding problem is no more than 1 − ε2 + (ε1)14

and the probability of distortion violation in the source coding problem has an exactlyreversed dependence on ε1, ε2: it is no more than 1− ε1 + (ε2)

14 .

The algorithmic implications of the duality are discussed in Section 5.

4 Binning Schemes for Multiterminal Channel and Source

Coding Problems

Binning schemes also constitute the best known achievable strategies for certain multitermi-nal channel and source coding problems. In particular, we are interested in the broadcast

15

channel and the lossy distributed source coding problem. A formula-level and random codinglevel dualities between these two problems have been pointed out in [5, 12] (more specifically,[5] only pointed out this duality between the deterministic broadcast channel and the losslessdistributed source coding problem). In this section, we discuss the performance of a fixeddeterministic binning scheme for both these problems. Our main result is an operationalduality, in the same sense as we have discussed earlier in this paper.

4.1 The Setting

The broadcast channel has a single input X and two outputs X1, X2. The problem is to com-municate independent information from the single transmitter to the two (non-cooperating)receivers. The best known achievable rate region is by Marton [9] and a specific instanceis the following. Fix the joint distribution of two auxiliary random variables U1, U2 (asP (u1, u2)) and the input by a deterministic map X(U1, U2). A random binning argumentshows that the rate pair (R1, R2) is achievable if

R1 ≤ I (U1; X1) ,

R2 ≤ I (U2; X2) ,

R1 + R2 ≤ I (U1; X1) + I (U2; X2)− I (U1; U2) .

(17)

The random binning scheme is much the same as in Section 3.1. Generate enI(Ui;Xi) typicalsequences Un

i according to∏n

j=1 P(uij) and randomly bin them into enRi bins (for i = 1, 2).Given messages (m1, m2), the encoder looks in the product bin (m1, m2) for a jointly typicalpair (u1, u2) and transmits x(u1,u2). Receiver i receives xi and looks among all the binsfor a unique ui that is jointly typical with xi. The index of the bin which contains the ui isdecoded as the message mi (for i = 1, 2).

The lossy distributed source coding problem involves two correlated sources X1, X2 (withdistribution P(x1, x2)). They have to be compressed in a distributed manner and reproducedjointly (in reproduction alphabets X1, X2) such that the average distortion is no more than Di

(with distortion measure di(xi, xi)) for both i = 1, 2. The best known achievable strategy forthis problem is in [2] and a specific instance is the following. We pick auxiliary random vari-able Ui with conditional distribution P (ui|xi) and U2 with conditional distribution P (u2|x2)for i = 1, 2 such that U1 → X1 → X2 → U2 forms a Markov chain. Consider the reproduction

function Xi(U1, U2) satisfying the average distortion constraint, i.e., E[d(Xi, Xi)

]≤ Di, for

i = 1, 2. Now a random binning argument shows that the compression rates (R1, R2) areachievable if

R1 ≥ I (U1; X1|U2)

R2 ≥ I (U2; X2|U1)

R1 + R2 ≥ I (X1X2; U1U2)

(18)

Consider a fixed joint probability distribution PX1X2XU1U2on random variables X1, X2, X,

U1, U2 and deterministic mappings X(U1, U2) satisfying

16

Communication RegionBroadcast Channel

B

ACompression RegionDistributed Source

I(U2; X2)− I(U1; X1)

I(U1; X1)

I(U2; X2)

I(U1; X1)− I(U1; U2)

R1

R2

Figure 6: The achievable rate regions for the two multiterminal problems: broadcast channeland distributed source coding. Observe that they share the same sum rate and corner points.

1. The conditional distribution PX|U1U2is determined by a deterministic mapping X(U1, U2).

2. (U1, U2) → X → (X1, X2) forms a Markov chain.

3. U1 → X1 → X2 → U2 forms a Markov chain.

4. S → X → U forms a Markov chain.

With these conditions, the two rate regions in (17) and (18) are depicted in Figure 6. Withthis, the formula level duality is clear: the two rate regions share the same two corner points(and the same sum rate). The random coding level duality follows from the identical natureof the random binning argument used in the proof. Our aim is to study the behavior of afixed binning scheme for the two multiterminal coding problems defined precisely below.

All the marginal and conditional probabilities are generated from the joint probabilitydistribution PX1X2XU1U2

in this section. Further, the deterministic mappings X(U1, U2) andall averages are also calculated with respect to this joint distribution. The broadcast channelcoding problem with constraints on coding scheme is the following. The constraint on theencoding is that the empirical distributions be met and the decoding constraint is a naturalone.

• The channel is described by PX1X2|X , where X is the input, X1, X2 are the outputs.

• The auxiliary random variables U1, U2 have the joint distribution PU1U2 .

17

• The marginal distribution PX on the input alphabet meets, on average, the givenconstraint on the transmitted signal.

• Given the message (m1, m2), the encoder uses joint typicality to find a pair (u1, u2)and then generates the input sequence x(u1,u2).

• Receiver i (i = 1, 2) receives xi and uses jointly typical decoding to figure out a ui,which is then used to decide which message is sent.

The distributed source coding problem with constraints on coding scheme is the following.The constraints are analogous to the ones imposed in the classical lossy source coding inSection 2.1.

• The sources X1 and X2 have joint distribution PX1X2 , X1 and X2 are the correspondingreproduction alphabet.

• The auxiliary random variable Ui has conditional distribution PUi|Xi, for i = 1, 2.

• Given distortion measure d1(x1, x1), d2(x2, x2) and a pair of distortion constraint

(D1, D2), we have E[d(X1, X1)

]≤ D1 and E

[d(X2, X2)

]≤ D2.

• Given the source sequence xi, the encoder i looks for jointly typical ui, which is thenused to decide which index is assigned to xi, for i = 1, 2.

• Upon receiving an index pair (m1, m2), the decoder looks for the joint typical pair (u1,u2) and then generates the reproduction sequences x1(u1,u2) and x2(u1,u2).

The broadcast channel coding and distributed source coding problems thus generated fromthe joint distribution are illustrated in Figure 7.

4.2 Code Construction

Consider a deterministic binning scheme that achieves the corner point A in Figure 6 asfollows.

1. We have an (n, ε1) channel code for the DMC {U1, PX1|U1 , X1} with rate R ≈ I(U1; X1),where the codewords are PU1 typical and decoding scheme is joint typical decoding.

2. We have an (n, ε3) channel code for the DMC {U2, PX2|U2 , X2} with rate R ≈ I(U2; X2),where the codewords are PU2 typical and decoding scheme is joint typical decoding.Each codeword corresponds to a message (index) m2.

3. The en(I(U1;X1) codewords can be split into approximately en(I(U1;X1)−I(U1;U2)) bins whereeach bin is a (n, ε2) channel code for the DMC {U1, PU2|U1 , U2}. Each bin correspondsto a message (index) m1.

18

encoder

decoder

decoder 2

decoder 1

encoder 1

encoder 2

Distributed Source Coding Problem

Broadcast Channel Coding Problem

encoder

decoder

m2

m1

U2

U1

PX1,X2|X

X1

m2

m1

X2

X1U1

U2

X2

PX1X2 (X1, X2)x2(u1, u2)

x1(u1, u2)

U2

U1

Xx(u1, u2)

m2

m1

PX2|U2

Figure 7: The broadcast channel coding problem and distributed source coding problem.

19

This code structure can be used in both the broadcast channel coding problem and dis-tributed source coding. Similar to Section 3, instead of using the random binning we wantto construct a deterministic binning scheme for the 4 random variables X1, X2, U1 and U2

with three DMCs {U1, PX1|U1 , X1}, {U2, PX2|U2 , X2} and {U1, PU2|U1 , U2}. The idea is touse the general structure of a binning scheme in Section 3 here.

1. Use the greedy algorithm described in Section 2 to construct a (n, ε3)-codes(f

(1)2 , φ

(1)2

)

for the DMC {U2,PX2|U2 ,X2}, then greedily construct a maximal family of (n, ε3)-codes(f

(i)2 , φ

(i)2

)for the DMC {U2,PX2|U2 ,X2} with disjoint codeword sets Ci, i = 1, · · · , M2

(This is similar to the second step as described in the greedy algorithm of Section 3,with only one DMC instead of two).

2. Use the greedy algorithm described in section 3, with the role of S replaced by U2, to

construct a code a family of (n, ε1)-code(f

(i)1 , φ

(i)1

)for the DMC {U1,PX1|U1 ,X1} and

a family of (n, ε2)-code(f

(i)1 , ψ

(i)1

)for the DMC {U1,PU2|U1 ,U2},

The code construction is shown in Figure 8. By using Theorem 3, we can see that eachcodeword set for the DMC {U2,PX2|U2 ,X2}

|Ci| ≥ exp {n (I(U2; X2)− τ)} , i = 1, · · · , M2, (19)

where τ is any value in (0, 1). The binning structure generated by step 2 of this greedilyalgorithm satisfies the statement in Theorem 3.

4.3 Coding for Broadcast Channel

The key difference between the use of this binning scheme in the Gel’fand-Pinsker problemand the broadcast channel is the way the “side-information” U2 is constructed: in the sideinformation problem it is generated i.i.d. where as in the broadcast channel it is pickedfrom the codebook for user 2 (using the message sent to user 2). To make this situationsymmetric, consider the following “dithering” idea. We assume that there exists a commonrandom number w between the encoder and user 2. w takes value on {1, · · · , M2, M2+1} withprobability distribution

{PU2 (C1) , · · · ,PU2

(CM2

), 1− PU2 (C)

}. Both user 2 and decoder 2

knows w.

The coding scheme to achieve corner point A in Figure 6 is as follows:

• Codebook Construction. Consider the code construction as described previously withthe replacement of ε1 by a sequence ε1,n and ε3 by a sequence ε3,n(with ε1,n → 0, ε3,n →0, as n →∞). Both the encoder and corresponding decoders know the code construc-tion.

• Encoding. Given the message pair (m1,m2), first choose u2 = (f(w)2 (m2)), that is,

based on the random number w, the message of user 2 is then sent using the w-th

20

C1

Un1 Xn

1

Un2

Xn2

C2

CC

CM

Pe ≤ ε3

Pe ≤ ε2

Pe ≤ ε1

Figure 8: Code construction for broadcast channel coding problem and distributed sourcecoding problem. The part enclosed in the dotted box is the same as Figure 5.

21

codebook. The encoder uses the binning scheme to map m1 to a u1 using u2 as theside information, i.e. u1 = f

(m1)1 (ψ

(m1)1 (u2)). The codeword sent through the channel

is x(u1,u2).

• Decoding. User 1, upon receiving x1, looks for u1 = f (m1)(φ(m1)(x1)) and decides that

message for user 1 is m1. User 2 decodes the message m2 where m2 = φ(w)2

(−1)(x2).

Our main result is the following.

Theorem 6. With this deterministic binning scheme, the maximal probabilities of can beupper bounded as follows.

Pe ≤ 1− ε2 + (ε1,n)1/4 + (ε3,n)1/4 + γ. (20)

Here γ > 0 can be made arbitrarily small.

The proof of this theorem is sketched in Appendix G. The calculations are quite analogousto the ones we did in the side information scenario.

4.4 Coding for Distributed Source Coding

As in the broadcast channel coding case, to use the deterministic binning structure to the dis-tributed source coding problem, we assume that there exists a common randomness betweenuser 2 and the decoder. Let w be a random variable which has value {1, · · · , M2, M2 + 1}with probability distribution

{PU2 (C1) , · · · ,PU2

(CM2

), 1− PU2 (C)

}. Both encoder 2 and

the decoder knows w.

The coding scheme to achieve corner point A in Figure 6 is as follows:

• Codebook Construction. Consider the code construction as described previously withthe replacement of ε2 by a sequence ε2,n(with ε2,n → 0, as n →∞). Both the encoderand the decoder know code construction.

• Encoding. The two encoders are given x1, x2 respectively. Encoder 1 looks for u1 =f

(m1)1 (φ(m1)(x1)) and maps x1 to index m1. Encoder 2 maps x2 to index m2 where

m2 = φ(w)2

(−1)(x2). That is, based on the random number w, the source sequence of

user 2 is then compressed using the w-th codebook.

• Decoding. Upon receiving the index pair (m1,m2), the decoder first choose u2 =

f(w)2 (m2) based on the random number w. Then the encoder uses the binning scheme

to map m1 to a u1 using u2 as the side information, i.e. u1 = f(m1)1 (ψ

(m1)1 (u2)). The

reproduction sequences is generated by x1(u1,u2) and x2(u1,u2).

Our main result is the following.

22

Theorem 7. With the deterministic binning scheme, the probability of distortion violationcan be bounded as follows.

Pdv ≤ (ε2,n)1/4 + (1− ε1) + (1− ε3) + γ. (21)

Here γ > 0 can be made arbitrarily small.

The proof of this theorem is omitted since as in the case of Theorem 6, the calculationhere is also quite analogous to what we did in the side information scenario.


We can summarize our discussion with an operational duality between the broadcast channelcoding problem and the distributed source coding problem with respect to a deterministicfixed binning scheme.

• The joint distribution PX1X2XU1U2with the required Markov properties is the start-

ing point of this duality. It defines a pair of broadcast channel coding problem anddistributed source coding problems.

• The binning scheme described in Section 3.2 and the common randomness introducedon U2 serves as the binning structure for both problems. Such a binning schemeallows the corner point (I(U1; X1)− I(U1; U2), I(U2; X2)) to be achievable for both theproblems.

• The error probability in the broadcast channel coding problem is no more than 1 −ε2 + ε

141 + ε

143 and the probability of distortion violation in the distributed source coding

problem has an exactly reversed dependence on ε1, ε2, ε3: it is no more than ε142 + 1 −

ε1 +1− ε3. This is a combination of the operational duality in classical coding problemand coding with side information described in the previous two sections.

5 Conclusions

We have investigated the operational duality between source coding and channel codingfrom a fixed binning strategy point of view. Our work differs from the “random coding levelduality” of [11] in the following aspects:

• [11] shows the “random coding level duality” in the reversed function of the encoder anddecoder of a random codebook ensemble for Gel’fand-Pinsker and Wyner-Ziv problem.Our operational duality is the relation for a deterministic coding scheme instead ofa random codebook ensemble. More importantly, we have a quantitative relationshipbetween Pe and Pdv.

23

• We show the duality between Gel’fand-Pinsker problem and Wyner-Ziv with the samejoint distribution PXXSU . We neither require that the conditional distribution PU |S tobe optimal for Gel’fand-Pinsker problem nor require the conditional distribution PU |Xto be optimal for Wyner-Ziv problem, and we do not consider any converse codingtheorem. Because we are not working with capacity or the rate distortion function,the problem of “rate loss” has no role in the understanding of our duality result.

• We do not need to use the result of the side information known at both the encoder anddecoder to show the duality as what is done in [11]. Thus we do not have the situationthat two different Gel’fand-Pinsker problems are dual to one Wyner-Ziv problem or twodifferent Wyner-Ziv problems are dual to one Gel’fand-Pinsker problem. We alwaysconsider a pair of problem and the dual is unique.

From the fixed binning scheme derived here, we can see an algorithmic connection betweencoding for the reliable communication problem and the data compression problem. Maximalcodebooks with the parameter ε adjusted subject to the purpose of the codebook work forboth the channel coding (by choosing ε close to 0) and the source coding problems (bychoosing ε close to 1). This suggests that good codes for the channel coding problem couldbe turned around to form a good source code. Indeed, good channel codes (LDPC codes)have been used to construct good source codes for the lossless data compression problemrecently [3]. The appropriate use of maximal codes for lossy compression is an interestingopen problem.

Furthermore, our binning scheme for coding with side information case and binning schemefor the broadcast channel coding problem and distributed source coding problem shows thatbins are separate codebooks, and the design of the source codebook and the channel codebookin each bin can be treated independently. Thus the more complex binning schemes for thesecoding problems can be constructed using the maximal code for the basic coding problems.

Appendix

A Some Properties of Type and Typical Sequences

Here we give some properties of types and typical sequences which are used intensively inthe proof of our results. Our treatment here close follows that in [8] (Chapter 1.2). We alsostrengthen some of their results and give detailed proof.

Given the distribution PX on X and two channels {X , PY |X , Y} and {X , QY |X , Y}, wedefine

H(PY |X |PX) =∑x∈X

PX(x)H(PY |X(·|x)), and (22)

D(PY |X‖QY |X |PX) =∑x∈X

PX(x)D(PY |X(·|x)‖QY |X(·|x)). (23)

24

A.1 Continuity of Entropy and Relative Entropy

First we give three lemmas used in proving some properties of types and typical sequences.

Lemma 1. If P and Q are two distributions on X such that

∑x∈X

|P(x)−Q(X)| ≤ Θ ≤ 1/2, (24)

then

|H(P)−H(Q)| ≤ −Θ logΘ

|X | . (25)

Proof. See lemma 1.2.7 of [8].

Lemma 2. If P and Q are two distributions on X such that

∑x∈X

|P(x)−Q(x)| ≤ Θ ≤ 1/2, (26)

then

D(P‖Q) ≤ −Θ logΘ

|X |2 . (27)

Proof.

D(P‖Q) =∑x∈X

P(x) logP(x)−∑x∈X

P(x) logQ(x)

≤ |∑x∈X

P(x) logP(x)−∑x∈X

Q(x) logQ(x)|+ |∑x∈X

(Q(x)− P(x)) logQ(x)|

≤ |H(P)−H(Q)|+∑x∈X

|Q(x)− P(x)| log |X |

≤ −Θ logΘ

|X | + Θ log |X |

= −Θ logΘ

|X |2 .

(28)

Lemma 3. If the distribution PX on X and two channels {X , PY |X , Y} and {X , QY |X , Y}satisfy ∑

x∈X ,y∈Y|PX(x)QY |X(y|x)− PX(x)PY |X(y|x)| ≤ Θ ≤ 1/2, (29)

then

D(QY |X‖PY |X |PX) ≤ −Θ logΘ

|X ||Y|2 . (30)

25

Proof.

D(QY |X‖PY |X |PX) =∑a∈X

PX(a)D(QY |X(·|a)‖PY |X(·|a))

≤∣∣∣∣∣∑a∈X

PX(a)(H(QY |X(·|a))−H(PY |X(·|a))

)∣∣∣∣∣

+ log |Y|∑a∈X

∑

b∈Y

∣∣PX(a)(QY |X(b|a)− PY |X(b|a))∣∣

≤ |H(QY |X |PX)−H(PY |X |PX)|+ Θ log |Y|= |H(PX ,QY |X)−H(PX ,PY |X)|+ Θ log |Y|≤ −Θ log

Θ

|X ||Y| + Θ log |Y|

= −Θ logΘ

|X ||Y|2 .

(31)

A.2 Some Properties of Type and Typical Sequences

First we review the definition of type, joint type and conditional type for completeness.

Definition 1. The type of a sequence x ∈ X n is the distribution Px on X defined by

Px(a) =1

nN(a|x) for every a ∈ X , (32)

where N(a|x) is the number of occurrence of a ∈ X in x.

We denote the set of sequences of type P in X n by TP .

Definition 2. If X and Y are two finite sets, the joint type of a pair of sequences x ∈ X n

and y ∈ Yn is the distribution Px,y on X × Y defined by

Px,y(a, b) =1

nN(a, b|x,y) for every a ∈ X , b ∈ Y . (33)

Definition 3. Let QY |X be a conditional distribution, We say that y ∈ Yk has conditionaltype QY |X given x ∈ X k if

N(a, b|x,y) = N(a|x)QY |X(b|a) for every a ∈ X , b ∈ Y (34)

For any given x ∈ X n and conditional distribution QY |X , the set of sequences y ∈ Yn

having conditional type QY |X will be called the QY |X-shell of x, denoted by TQY |X (x).

26

Definition 4. For any distribution PX on X , a sequence x ∈ X n is called PX-typical withconstant δ if ∣∣∣∣

1

nN(a|x)− PX(a)

∣∣∣∣ ≤ δ for every a ∈ X ,

and, in addition, no a ∈ X with PX(a) = 0 occurs in x.

The set of such sequences will be denoted by T[PX ],δ. or T[X],δ.

Definition 5. Given a channel {X , PY |X , Y}, a sequence y ∈ Yn is PY |X-typical (orY |X-typical) under the condition x ∈ X n with constant δ if

∣∣∣∣1

nN(a, b|x,y)− 1

nN(a|x)PY |X(b|a)

∣∣∣∣ ≤ δ for every a ∈ X , b ∈ Y

and, in addition, N(a, b|x,y) = 0 whenever PY |X(b|a) = 0.

The set of such sequences y will be denoted by T[PY |X ],δ(x) or T[Y |X],δ(x).

Remark 1. We use the same Delta-Convention as in [8]. That is, to every set X we canassociate a sequence {δn}∞n=1 satisfying

δn → 0,√

n · δn →∞ as n →∞. (35)

Typical sequences are understood with these δn’s. In this section, we use δn to define PX-typical sequences and δ′n to define PY |X-typical sequences. In later sections, we use

{δ[X],n

}and

{δ[Y |X],n

}etc. to distinguish different {δn} sequence in the definition of different typical

sequence. When there is no confusion in which δn is used, we simply write T[X], T[Y |X](x),etc.

Now we give some useful properties.

Lemma 4. The number of different types of sequences in X n is no more than (n + 1)|X |.

Proof. See the proof of Lemma 1.2.2 in [8] (Chapter 1.2).

Lemma 5. For every type P of sequences in X n and every distribution PX on X , theprobability for every x ∈ TP is

PnX(x) = exp{−n(D(P‖PX) + H(P ))}, (36)

and the probability of the set TP satisfies

(n + 1)−|X | exp{−nD(P‖PX)} ≤ PnX(TP ) ≤ exp{−nD(P‖PX)}. (37)

Similarly, given channel {X , PY |X , Y} and conditional distribution QY |X , for every x ∈ X n

such that TQY |X (x) is non-void, the probability for every y ∈ TQY |X (x) is

PnY |X(y|x) = exp{−n(D(QY |X‖PY |X |Px) + H(QY |X |Px))}, (38)

and the probability of the set TV (x) satisfies

(n+1)−|X ||Y| exp{−nD(QY |X‖PY |X |Px)} ≤ PnY |X(TQY |X (x)|x) ≤ exp{−nD(QY |X‖PY |X |Px)}.

(39)

27

Proof. See the proof of lemma 1.2.6 in [8].

Lemma 6. There exists a sequence εn → 0 as n →∞ so that for every distribution PX onX and every x ∈ T[PX ],

exp{−n(H(X) + εn)} ≤ PnX(x) ≤ exp{−n(H(X)− εn)}. (40)

Similarly, given channel {X , PY |X , Y}, for every x ∈ T[PX ] and y ∈ T[Y |X](x), there existsa sequence ε′n → 0, such that

exp{−n(H(Y |X) + ε′n)} ≤ PnY |X(y|x) ≤ exp{−n(H(Y |X)− ε′n)}. (41)

Furthermore, the εn and ε′n can be chosen to be

εn = −δn|X | logδ2n

|X | , and (42)

ε′n = δn|X | log |Y| − |X ||Y|δ′n logδ′2n|Y| . (43)

Proof. Without loss of generality we can assume δn|X | ≤ 12, then

|H(Px)−H(PX)| ≤ −δn|X | log δn, and

D(Px‖PX) ≤ −δn|X | logδn

|X | .

From lemma 5 we know PnX(x) = exp{−n(H(Px) + D(Px‖PX))}, therefore

PnX(x) ≥ exp

{−n

(−δn|X | log

δn

|X | + H(PX)− δn|X | log δn

)}

= exp

{−n

(H(PX) + (−δn|X | log

δ2n

|X |))}

,

(44)

and

PnX(x) ≤ exp{−nH(Px)}

≤ exp{−n(H(PX)− (−δn|X | log δn))}. (45)

So we can choose εn = −δn|X | log δ2n

|X | .

Similarly from lemma 5 we know

PnY |X(y|x) = exp{−n(D(QY |X‖PY |X |Px) + H(QY |X |Px))} if y ∈ TQY |X (x). (46)

Since T[PY |X ](x) is the union of disjoint QY |X-shells TQY |X (x), all of which satisfies

|Px(a)QY |X(b|a)− Px(a)PY |X(b|a)| ≤ δ′n ∀a ∈ X , b ∈ Y , (47)

28

Without loss of generality, we assume δ′n|X ||Y| ≤ 12. Thus

|H(QY |X |Px)−H(PY |X |Px)| ≤ −|X ||Y|δ′n log δ′n,

D(QY |X‖PY |X |Px) ≤ −|X ||Y|δ′n logδ′n|Y| .

(48)

And since x is PX-typical, we have

|H(PY |X |Px)−H(PY |X |PX)| ≤ δn|X | log |Y|. (49)

Thus,

PnY |X(y|x) ≥ exp

{−n

(−|X ||Y|δ′n log

δn

|Y| + H(PY |X |Px)− |X ||Y|δ′n log δ′n

)}

≥ exp

{−n

(−|X ||Y|δ′n log

δn

|Y| + H(PY |X |PX) + (δn|X | log |Y| − |X ||Y|δ′n log δ′n)

)}

= exp

{−n

((H(PY |X |PX) +

(δn|X | log |Y| − |X ||Y|δ′n log

δ′2n|Y|

))}

(50)

and

PnY |X(y|x) ≤ exp{−nH(QY |X |Px)}

≤ exp{−n(H(PY |X |Px)− (−|X ||Y|δ′n log δ′n))}≤ exp{−n(H(PY |X |PX)− (δn|X | log |Y| − |X ||Y|δ′n log δ′n))}

(51)

So we can choose ε′n = δn|X | log |Y| − |X ||Y|δ′n log δ′2n|Y| .

Lemma 7. There exist sequences εn → 0 and ε′n → 0 so that every distribution PX on Xand every channel {X , PY |X , Y} satisfy

PnX

(T[PX ]

) ≥ 1− εn, (52)

PnY |X

(T[PY |X ](x)

∣∣∣x)

≥ 1− ε′n for every x ∈ T[PX ] (53)

Furthermore, the εn can be chosen to be

εn = exp{−nδ2nc1} (54)

for some fixed constant c1, and ε′n can be chosen to be

ε′n = exp{−nδ′2nc2} (55)

for some fixed constant c2.

29

Proof. If x = x1x2 . . . xn, let Y1, Y2, . . . , Yn be independent random variables with distributionPYi

= PY |X(·|xi). Then the random variable N(a, b|x, Y n) is the summation of N(a|x)

i.i.d.binary random variables with distribution P(1) = PY |X(b|a),P(0) = 1−PY |X(b|a). Thus

P(

1

n|N(a, b|x, Y n)−N(a|x)PY |X(b|a)| > δ′n

)

≤ P(

1

N(a|x)|N(a, b|x, Y n)−N(a|x)PY |X(b|a)| > δ′n

)

= P(∣∣∣∣

N(a, b|x, Y n)

N(a|x)− PY |X(b|a)

∣∣∣∣ > δ′n

)

≤ exp(−N(a|x)cδ′2n

)

≤ exp

(−n

N(a|x)

ncδ′2n

)

≤ exp(−nPX(a)cδ′2n/2

),

(56)

where c > 0 is a constant [1] (see Theorem 9.4, pp153). Hence PnY |X(T n

[PY |X ](x)|x) ≥1 − |X |‖Y| exp

(−n

(mina∈X

PX(a)

)cδ′2n/2

). So for n large enough, we can choose ε′n =

exp{−c2nδ′2n

}for some fixed constant c2 in the second equation and similarly εn = exp {−c1nδ2

n}for some fixed constant c1 in the first equation.

Lemma 8. There exist sequences εn → 0 and ε′n → 0 so that every distribution PX on Xand channel {X , PY |X , Y} satisfy

∣∣∣∣1

nlog

∣∣T[PX ]

∣∣−H(PX)

∣∣∣∣ ≤ εn, (57)

∣∣∣∣1

nlog

∣∣∣T[PY |X ](x)∣∣∣−H(PY |X |PX)

∣∣∣∣ ≤ ε′n for every x ∈ T[PX ]. (58)


εn =1

n|X | log(n + 1)− |X |δn log δn, (59)

and

ε′n =1

n|X ||Y| log(n + 1) + δn|X | log |Y| − |X ||Y|δ′n log δ′n. (60)

Proof. The expression for εn and ε′n are implied in the proof of lemma 1.2.13 of [8].

Lemma 9. Given 0 < η < 1, there exist εn → 0 and ε′n → 0 so that

(i) if A ⊂ X n,PnX(A) ≥ η, then

1

nlog |A| ≥ H(PX)− εn, (61)

(ii) if B ⊂ Yn,PnY |X(B|x) ≥ η, then

1

nlog |B| ≥ H(PY |X |Px)− ε′n. (62)

30


εn = − 1

nlog

η

2− |X |δn log δn +

1

n|X | log(n + 1), (63)

and

ε′n = − 1

nlog

η

2− |X ||Y|δ′n log δ′n +

1

n|X ||Y| log(n + 1). (64)

Corollary 1. Given 0 < η < 1, there exists a sequence ε′′n → 0 so that if B ⊂ Yn andPn

Y |X(B|x) ≥ η for some x ∈ T[PX ], then

1

nlog |B| ≥ H(PY |X |PX)− ε′′n. (65)

Furthermore, the ε′′n can be chosen to be

ε′′n = − 1

nlog

η

2− |X ||Y|δ′n log δ′n +

1

n|X ||Y| log(n + 1) + δn|X | log |Y|. (66)

Proof. The lemma and the corollary simply follow the proof of corollary 1.2.14 of [8] and theprevious lemma.

Remark 2. With the exception of lemma 7, the sequences {εn}, {ε′n} and {ε′′n} in this sectionare of the order −δn log δn for n large enough if we choose δn = δ′n.

B Proof of Theorem 1

Proof. Our proof closely follows the proof of the Maximal Code Lemma in [8] ( Lemma 3 insection 2.1).

From lemma 7 we know that there exists an n1 and a constant c such that when n > n1

PnY |X

(T[Y |X](x)

∣∣x) ≥ 1− exp{−nδ2

[Y |X],nc}

, for every x ∈ T[X]. (67)

Fix a constant d < c, let (f, φ) be an(n, exp

{−nδ2

[Y |X],nd})

-code generated by the greedy

algorithm described in section 2.2. Obviously this code satisfies the requirement on encodingmap (equation (2)) and decision region (equation (3)). Furthermore, the probability of error

εn = exp{−nδ2

[Y |X],nd}→ 0.

LetB ,

⋃m∈Mf

φ−1(m). (68)

The greedy code construction stops when

PnY |X

(T[Y |X](x)−B

∣∣x)< 1− exp

{−nδ2[Y |X],nd

}, for every x ∈ T[X]. (69)

31

From equation (67) and (69) we have

PnY |X(B|x) > exp

{−nδ2[Y |X],nd

}− exp{−nδ2

[Y |X],nc}

, for every x ∈ T[X]. (70)

Fix a τ ∈ (0, 1), we have PnX

(T[X]

)> (1− τ) for n > n2. Therefore,

PnY (B) >

(exp

{−nδ2[Y |X],nd

}− exp{−nδ2

[Y |X],nc2

})(1− τ). (71)

Let δ[Y ],n =(δ[Y |X],n + δ[X],n

) |X |, then B ⊂ T[Y ]. Therefore, from lemma 9 we get that forn > n3,

|B| > exp {n(H(Y )− ε′′n)} , (72)

where

ε′′n = dδ2[Y |X],n −

1

nlog

{(1− exp

{−nδ2[Y |X],n(c2 − d)

})(1− τ)

}

−|Y|δ[Y ],n log δ[Y ],n +1

n|Y| log(n + 1). (73)

On the other hand, by Lemma 8, for n > n4 we have

|B| =∑

m∈Mf

∣∣φ−1(m)∣∣ ≤

∑m∈Mf

∣∣T[Y |X](f(m))∣∣ ≤ |Mf | · exp {n (H(Y |X) + ε′n)} , (74)

where

ε′n =1

n|X ||Y| log(n + 1) + δ[X],n|X | log |Y| − |X ||Y|δ[Y |X],n log δ[Y |X],n. (75)

Comparing equation (72) and (74), we get that for n > max(n1, n2, n3, n4),

1

nlog |Mf | ≥ I(X; Y )− αn, (76)

where

αn = ε′′n + ε′n

= dδ2[Y |X],n −

1

nlog

{(1− exp

{−nδ2[Y |X],n(c2 − d)

})(1− τ)

}

−|Y|δ[Y ],n log δ[Y ],n +1

n|Y| log(n + 1)

+1

n|X ||Y| log(n + 1) + δ[X],n|X | log |Y| − |X ||Y|δ[Y |X],n log δ[Y |X],n. (77)

Hence αn → 0 as n →∞.

32

C Proof of Theorem2

First we define the notion of η-image:

Definition 6. A set B ⊂ Y is an η-image (0 < η ≤ 1) of a set A ⊂ X over a channel{X , PY |X , Y} if PY |X(B|x) ≥ η for every x ∈ A. The minimum cardinality of η-image ofA is denoted by g(A, η).

Proof. This is a refined version of Lemma 2.1.4 in [8] (section 2.1). Our proof closely followsthe proof there.

Fix a τ ∈ (0, 1), let (f, φ) be any (n, ε)-code for the DMC {X , PY |X , Y}, let A , {f(m) :m ∈ Mf} ⊂ T[P ]. Let B ⊂ Yn be an (ε + τ)-image of A with |B| = g(A, ε + τ). SincePn

Y |X(φ−1(m)

∣∣f(m)) ≥ 1− ε, we have Pn

Y |X(B ∩ φ−1(m)

∣∣f(m)) ≥ τ for every m ∈ Mf . By

applying Lemma 9, we see that for n > n1

|B ∩ φ−1(m)| ≥ exp {n(H(Y |X)− ε′′n)} , (78)

where

ε′′n = − 1

nlog

τ

2− |X ||Y|δ[Y |X],n log δ[Y |X],n +

1

n|X ||Y| log(n + 1) + δ[X],n|X | log |Y|. (79)

Since φ−1(m) are disjoint, we have

|B| ≥∑

m∈Mf

∣∣B ∩ φ−1(m)∣∣ ≥ |Mf | exp {n(H(Y |X)− ε′′n)} . (80)

On the other hand, A ⊂ T[P ] implies

|B| = g(A, ε + τ) ≤ g(T[P ], ε + τ

). (81)

Since T[Y ] is an (ε+τ)-image of every subset of T[P ], provided that δ[Y ],n = (δ[X],n+δ[Y |X],n)|X |,we have that for n > n2

g(T[P ], ε + τ

) ≤∣∣T[Y ]

∣∣ ≤ exp {n (H(Y ) + εn)} , (82)

where

εn =1

n|Y| log(n + 1)− |Y|δ[Y ],n log δ[Y ],n. (83)

Thus for n > max(n1, n2),

1

nlog |Mf | ≤ I(X; Y ) + βn, (84)

where

βn = ε′′n + εn

= − 1

nlog

τ

2− |X ||Y|δ[Y |X],n log δ[Y |X],n +

1

n|X ||Y| log(n + 1) + δ[X],n|X | log |Y|

+1

n|Y| log(n + 1)− |Y|δ[Y ],n log δ[Y ],n. (85)

Hence βn → 0 as n →∞.

33

D Proof of Theorem 3

Proof. Consider the binning scheme constructed greedily using the algorithm described insection 3.2. Obviously this scheme satisfies the requirement 1 and 2 of this theorem. In thefollowing we calculate the rate.

Let E(i) denote the set of common codewords in the i-th bin, and

A(i) ,⋃

m∈Mi

φ(i)−1(m), (86)

B(i) ,⋃

m∈Mi

ψ(i)−1(m), (87)

C(i) ,{

u ∈ T[U ] : PnX|U

(i⋃

k=1

A(k)∣∣∣u

)> ε1 − τ1

}, (88)

D(i) ,{u ∈ T[U ] : Pn

S|U(B(i)

∣∣u)> ε2 − τ2

}. (89)

The existence of the first codebook pair(f (1), φ(1)

)and

(f (1), ψ(1)

), which has rate (12) for

i = 1 is the result of Lemma 3.3.8 in [8] (Chapter 3.3). We give the proof here for later use.

Consider (n, ε1) code(f (1), φ(1)

)for the DMC {U , PX|U , X} and (n, ε2) code

(f (1), ψ(1)

)for the DMC {U , PX|U , S} with codewords f (1)(m) ∈ T[U ] generated by the first step of thegreedy algorithm. When the first step of the greedy algorithm stops, we have

PnX|U

(T[X|U ](u)− A(1)

∣∣u) ≤ 1− ε1 or

PnS|U

(T[S|U ](u)−B(1)

∣∣u) ≤ 1− ε2, ∀u ∈ T[U ].

Fix 0 < τ1 < ε1, 0 < τ2 < ε2, as in the proof of Theorem 1, for n > n1, we have

PnX|U

(A(1)

∣∣u)> ε1 − τ1 or Pn

S|U(B(1)

∣∣u)> ε2 − τ2, ∀u ∈ T[U ]. (90)

That is,u ∈ C(1) or u ∈ D(1), ∀u ∈ T[U ]. (91)

Given η ∈ (0, 1), if PnU

(C(1)

)> η, we have

PnX

(A(1)

)> η (ε1 − τ1) . (92)

Then by Corollary 1, for n > n2 we have

∣∣A(1)∣∣ ≥ exp{n(H(X)− τ)}. (93)

On the other hand, for n > n3 we have

∣∣A(1)∣∣ ≤

∑m∈M

f(1)

∣∣∣φ(1)−1(m)

∣∣∣ ≤∑

m∈Mf(1)

∣∣T[X|U ](f(m))∣∣ ≤

∣∣Mf (1)

∣∣ exp{n(H(X|U) + τ)}. (94)

34

Thus, ∣∣Mf (1)

∣∣ ≥ exp{n(I(U ; X)− 2τ)}. (95)

But by applying converse coding theorem to the DMC {U , pX|U , S} we know that

∣∣Mf (1)

∣∣ < exp{n(I(U ; S) + 2τ)}. (96)

Hence for n large enough we have a contradiction. This shows that for any η ∈ (0, 1),

PnU

(C(1)

) ≤ η. (97)

From (91) we know thatT[U ] −D(1) ⊂ C(1). (98)

Thus,Pn

U

(T[U ] −D(1)

) ≤ η. (99)

So fix τ3 ∈ (0, 1− η), we havePn

U

(D(1)

)> 1− τ3 − η (100)

for n > n4. Therefore,Pn

S

(B(1)

)> (ε2 − τ2)(1− τ3 − η). (101)

Thus, by Corollary 1, for n large enough we have

∣∣B(1)∣∣ ≥ exp{n(H(S)− τ)}. (102)

On the other hand, for n large enough we have

∣∣B(1)∣∣ ≤

∑m∈M

f(1)

∣∣∣ψ(1)−1(m)

∣∣∣ ≤∑

m∈Mf(1)

∣∣T[S|U ](f(m))∣∣ ≤

∣∣Mf (1)

∣∣ exp{n(H(S|U) + τ)}. (103)

Combining the previous two inequalities, we have

∣∣Mf (1)

∣∣ ≥ exp{n(I(U ; S)− 2τ)}. (104)

Now suppose we already have a family of i pairs of codes with disjoint A(j), j = 1, · · · , i,the rate of each codebook satisfies (12) and the code family satisfies

PnU(C(i)) ≤ η. (105)

We want to add the i + 1th pair to the family, the codewords in i + 1th codebook is chosenfrom the set T[U ] − C(i), and A(i+1) is disjoint with

⋃j≤i A

(j).

For any u ∈ T[U ] − C(i) we have

PnX|U

(i⋃

j=1

A(j)∣∣u

)≤ ε1 − τ1, (106)

35

hence

PnX|U

(T[X|U ](u)−

i⋃j=1

A(j)∣∣u

)> 1− ε1. (107)

In addition, we havePn

S|U(T[S|U ](u)

∣∣u)> 1− ε2. (108)

From the above two equation, it is clear that we can choose any u ∈ T[U ] − C(i) as the firstcodeword in the i + 1th pair of codebook.

Then we keep on adding codewords to the i + 1th pair of codebook until we can not doso. As in the case for the first codebook pair, we know that for n > n1,

PnX|U(

i+1⋃j=1

A(j)|u) > ε1− τ1 or PnS|U(B(i+1)|u) > ε2− τ2 ∀u ∈ T[U ]−C(i). (109)

But for u ∈ C(i), we have

PnX|U(

i+1⋃j=1

A(j)|u) > PnX|U(

i⋃j=1

A(j)|u) > ε1 − τ1. (110)

Therefore,

PnX|U(

i+1⋃j=1

A(j)|u) > ε1 − τ1 or PnS|U(B(i+1)|u) > ε2 − τ2 ∀u ∈ T[U ]. (111)

From the definition of C(i) we can easily see that the size of C(i) is increased with theincrease of i. So after adding the i + 1th pair to our codebook family, there are two possi-bilities:

(1)Pn

U

(C(i+1)

) ≤ η. (112)

In this case, following the same argument as in the first codebook pair case, we have

1

nlog

∣∣Mf (i+1)

∣∣ ≥ I(U ; S)− 2τ, (113)

and we continue our construction of codebook family by adding the next codebook pair.

(2)Pn

U

(C(i+1)

)> η, (114)

and the construction of the codebook family stops.

Assume that the construction stops when i = M−1. From the above discussion, it is clearthat we have a codebook family, with disjoint A(j), j = 1, · · · , M , such that

PnU

(C(M−1)

)≤ η, (115)

36

the rate of codebook pair for j = 1, · · · , M − 1 satisfies (12). For the Mth codebook pair,we have

PnU

(CM

)> η. (116)

In this case, we have

PnX(

M⋃j=1

A(j)) > η(ε1 − τ1). (117)

Hence following the similar steps to get (95), we get

M∑j=1

∣∣Mf (j)

∣∣ ≥ exp {n(I(U ; X)− 2τ)}. (118)

On the other hand, for j = 1, · · · , M , we have

∣∣Mf (j)

∣∣ ≤ exp {n(I(U ; S) + 2τ)}. (119)

Hence from (118) and (119), we have

M ≥ exp {n(I(U ; X)− I(U ; S)− 4τ)}. (120)

From the disjointness of A(j), j = 1, · · · , M we know thatM⋃

j=1

E(j) is the codeword set for an

(n, ε1) code for the DMC {U , pX|U , X}, therefore

M∑j=1

∣∣Mf (j)

∣∣ ≤ exp {n(I(U ; X) + 2τ)}. (121)

In the mean time, for j = 1, · · · , M − 1 we have

∣∣Mf (j)

∣∣ ≥ exp {n(I(U ; S)− 2τ)}. (122)

Hence from (121) and (122), we have

M − 1 ≤ exp {n(I(U ; X)− I(U ; S) + 4τ)}. (123)

Remark 3. From our refined proof of the Maximal Code Lemma, we can see that if we re-place τ1 by a sequence τ1,n = exp{−c2nδ2

[X|U ],n} where c2 is the constant in equation (55), and

replace ε1 by a sequence ε1,n = exp{−d1nδ2[X|U ],n} with d1 < c2, the result also holds. Simi-

larly, if we replace τ2 by τ2,n = exp{−c2nδ2[S|U ],n}, and replace ε2 by ε2,n = exp{−d1nδ2

[S|U ],n}with d1 < c2, the result still holds.

37

E Proof of Theorem 4

Proof. For a given message m there are two error events:(1) E1: For a channel side information s, the sender can not find a u such that u =f (m)(ψ(m)(s)). This happens when s /∈ B(m).(2) E2: The channel side information s ∈ B(m) so that the sender can find a u satisfyingu = f (m)(ψ(m)(s)), but the received x does not satisfy u ∈ f (m)(φ(m)(x)). This happenswhen x /∈ A(m).

For a given message m, we have (c.f. equation (101))

P (E1) = P(Sn /∈ B(m)

)

= 1− PnS

(B(m)

)

< 1− (ε2 − τ2)(1− τ3 − η). (124)

To calculate P (E2), let the codewords in the m-th code pair be u(m)k , k = 1, 2, · · · ,Mm

where Mm =∣∣Mf (m)

∣∣. Let A(m)k , φ(m)−1

(k), B(m)k , ψ(m)−1

(k), then

P (E2) = P(Xn /∈ A(m)

∣∣∣message m)

=∑

s∈B(m)

P(Xn /∈ A(m)

∣∣∣message m, s)Pn

S(s). (125)

For a given s ∈ B(m),∃k, such that s ∈ B(m)k . The sender sends x = h

(u

(m)k , s

). For this

s, we have

P(Xn /∈ A(m)

∣∣∣message m, s)

< P(Xn /∈ A

(m)k

∣∣∣x, s)

= P(Xn /∈ A

(m)k

∣∣∣u(m)k , s

), (126)

where the last equation follows from

P(x|u, s) =∑x

P(x|u, s, x)P(x|u, s) = P(x|x(= h(u, s)), s). (127)

Thus,

P (E2) ≤∑

k

∑

s∈B(m)k

P(A

(m)k

C∣∣∣u(m)

k , s)Pn

S(s). (128)

Choose ε1,n = exp{−d1nδ2[X|U ],n

} with d1 < c2, c2 is the constant in equation (55). Given

sequence {εn} such that εn =√

ε1,n = exp{−nδ2

[X|U ],nd1/2}

, for each m and k, define

G(m)k ,

{s ∈ B

(m)k : Pr

{A

(m)k

C∣∣∣u(m)

k , s}

> εn

}. (129)

We can think of⋃

k≤Mm

G(m)k as the set of side information s which will cause a probability

of decoding error large than εn. As long as PnS(

⋃k≤Mm

G(m)k ) is small, we can make P (E2)

small. The following lemma shows that by carefully choosing sequences {δ[U ],n}, {δ[S|U ],n}and {δ[X|U ],n}, Pn

S(⋃

k≤Mm

G(m)k ) can be made arbitrarily small.

38

Lemma 10. There exist sequences {δ[U ],n}, {δ[S|U ],n} and {δ[X|U ],n} for the definition ofT[U ], T[S|U ] and T[X|U ] respectively such that

PnS(

⋃

k≤Mm

G(m)k ) → 0 as n →∞. (130)

Proof. For a given sequence {λn} such that λn = 12d1δ

2[X|U ],n

, if

P(G

(m)k

∣∣∣u(m)k

)> exp{−nλn}, (131)

then

PnX|U

(A

(m)k

C∣∣∣u(m)

k

)>

∑

s∈G(m)k

PnX|SU

(A

(m)k

C∣∣∣u(m)

k , s)Pn

S|U(s∣∣∣u(m)

k

)

> εn exp {−nλn}= exp

{−d1nδ2[X|U ],n

}.

(132)

But from the definition of the m-th code for channel {U , PX|U , X}, we have

PnX|U

(A

(m)k

C∣∣∣u(m)

k

)< exp

{−d1nδ2[X|U ],n

}, (133)

leading to a contradiction. Therefore we have

PnS|U

(G

(m)k

∣∣∣u(m)k

)< exp {−nλn} . (134)

Since for s ∈ G(m)k we have s ∈ T n

[S|U ]

(u

(m)k

), by Lemma 6 we get

PnS|U

(s∣∣u(m)

k

)≥ exp{−n(H(S|U) + ρ1,n)}, (135)

where

ρ1,n = δ[U ],n|U| log |S| − |U||S|δ[S|U ],n logδ2[S|U ],n

|S| . (136)

From the previous two inequalities we have

|G(m)k | < exp{n(H(S|U) + ρ1,n − λn)}. (137)

From Theorem 2 we know

Mm < exp{n(I(U ; S) + ρ2,n)}, (138)

where (c.f. equation (85))

ρ2,n = − 1

nlog

τ2

2− |U||S|δ[S|U ],n log δ[S|U ],n +

1

n|U||S| log(n + 1) + δ[U ],n|U| log |S|

+1

n|S| log(n + 1)− |S|δ[S],n log δ[S],n. (139)

39

Hence ∣∣∣∣∣⋃

k≤Mm

G(m)k

∣∣∣∣∣ < exp{n(H(S|U) + ρ1,n − λn)} exp{n(I(U ; S) + ρ2,n)}. (140)

Finally, for s ∈ G(m)k we have s ∈ T n

[S]. Thus from Lemma 6 we get

PnS(s) ≤ exp{−n(H(S)− ρ3,n)}, (141)

where

ρ3,n = −δ[S]|S| logδ2[S]

|S| . (142)

Combining the previous two inequalities we get

PnS

( ⋃

k≤Mm

G(m)k

)< exp{−n(λn − ρ1,n − ρ2,n − ρ3,n)}. (143)

Now we can choose δ[U ],n = δ[S|U ],n = δn, δ[S],n = 2δn|U| and δ[X|U ],n = δ′n. Then for n largeenough we have ρ1,n + ρ2,n + ρ3,n ≤ −d2δn log δn for some constant d2 > 0. So by choosing−d2δn log δn < λn = d1δ

′n2/2, for example, choosing δn = 1

n1/4 and δ′n = 1n1/16 , we can find a

λn such that

PnX|US

(A

(m)k

C∣∣∣u(m)

k , s)

< εn = (ε1,n)1/2 for s ∈ B(m)k −G

(m)k , (144)

PnS

( ⋃

k≤Mm

G(m)k

)< (ε1,n)1/4 (145)

for large enough n.

By using this lemma, we have

P(E2) ≤∑

k≤Mm

∑

s∈B(m)k

PnX|US

(A

(m)k

C∣∣∣u(m)

k , s)Pn

S(s) (146)

≤∑

k≤Mm

∑

s∈B(m)k −G

(m)k

PnX|US

(A

(m)k

C∣∣∣u(m)

k , s)Pn

S(s) + PnS

( ⋃

k≤Mm

G(m)k

)(147)

< (ε1,n)1/2 + (ε1,n)1/4. (148)

So the probability of error for each message is

Pe = P(E1) + P(E2)

< 1− (ε2 − τ2)(1− τ3 − η) + (ε1,n)1/2 + (ε1,n)1/4

= 1− ε2 + (ε1,n)1/4 + (τ2(1− τ3 − η) + ε2(τ3 + η) + (ε1,n)1/2),

(149)

40

which can be simplified toPe < 1− ε2 + (ε1,n)1/4 + γ. (150)

Here γ can be chosen to be arbitrarily small by making τ2, τ3, η sufficiently small. Hence∀ε > 0, for n large enough, we can adjust the parameter ε2 and ε1,n so that

ε2 − (ε1,n)1/4 − γ > 1− ε, (151)

then the probability of error for this coding scheme is less than ε.

Specifically, if ε is close to 0, since ε1,n goes to 0, we are using a channel codebook for{U , PX|U , X}. In the mean time we can choose ε2 close to 1, i.e., using a source codebookfor source S with reproduction alphabet U , to make the probability of error for this codingscheme close to 0.

F Proof of Theorem 5

Proof. There are three events in which (x,x) does not satisfy the distortion constraint:

(1) E1: x /∈M⋃i=1

A(i),

(2) E2: s /∈ T[S|X](x),(3) E3: u = f (m)(φ(m)(x)), but u 6= f (m)(ψ(m)(s)).The probability of distortion violation Pdv is the probability of the union of the above events.

The probability of E1 satisfies (c.f. equation (117)):

P(E1) < 1− η(ε1 − τ1), (152)

and it is obvious that for fixed τ5 ∈ (0, 1), the probability of E2 satisfies

P(E2 ∩ EC1 ) < τ5 (153)

We now focus on the calculation of P(E3 ∩ EC2 ∩ EC

1 ).

Let the codewords in the m-th bin be u(m)k , k = 1, 2, · · · ,Mm, where Mm =

∣∣Mf (m)

∣∣. Let

A(m)k , φ(m)−1

(k), B(m)k , ψ(m)−1

(k). Note for a given x, the event EC1 ∩ EC

2 ∩ E3 occurs

when x ∈ A(m)k , u

(m)k = f (m)(φ(m)(x)), s /∈ B

(m)k . Thus

P(EC1 ∩ EC

2 ∩ E3) =∑m

∑

k

∑

x∈A(m)k

PnS|X

(Sn /∈ B

(m)k

∣∣∣x)Pn

X(x). (154)

Choose sequence ε2,n = exp{−d1nδ2[S|U ],n

} with d1 < c2, c2 is the constant in equation

(55). Given sequence {εn} such that εn =√

ε2,n = exp{−nδ2

[S|U ],nd1/2}

, for each m and

41

k, define G(m)k ,

{x ∈ A

(m)k : Pn

S|UX

(B

(m)k

C∣∣∣u(m)

k ,x)

= PnS|X

(B

(m)k

C∣∣∣x

)> εn

}. Here we use

the Markov chain U → X → S.

As in the proof of Theorem 4, we first prove the following lemma:

Lemma 11. There exists sequence {δ[U ],n}, {δ[S|U ],n} and {δ[X|U ],n} for the definition ofT[U ], T[S|U ] and T[X|U ] respectively such that

PnX

⋃

m≤M

⋃

k≤Mm

G(m)k

→ 0. (155)

Proof. The proof of this lemma is very similar to lemma 10.

We choose δ[U ],n = δ[X|U ],n = δn, δ[X],n = 2δn|U| and δ[S|U ],n = δ′n.

For a given sequence {λn} such that λn = d1δ2[S|U ],n/2, if

PnX|U

(G

(m)k

∣∣∣u(m)k

)> exp(−nλn). (156)

Then

PnS|U

(B

(m)k

C∣∣∣u(m)

k

)>

∑

x∈G(m)k

PnS|UX

{B

(m)k

C∣∣∣u(m)

k ,x}Pn

X|U(x∣∣∣u(m)

k

)

> εn exp {−nλn}= exp

{−d1nδ2[S|U ],n

}.

(157)

But from the definition of the m-th code for channel {U , PS|U , S}, we have

PnS|U

(B

(m)k

C∣∣∣u(m)

k

)< exp

(−d1nδ2[S|U ],n

), (158)

leading to a contradiction. This means that

PnX|U

(G

(m)k

∣∣∣u(m)k

)< exp (−nλn) . (159)

For x ∈ G(m)k , we have x ∈ T[X|U ](u). From Lemma 6 we have

exp {−n (H(X|U) + ρ1,n)} ≤ PnX|U

(x∣∣u(m)

k

)≤ exp {−n (H(X|U)− ρ1,n)} . (160)

Thus, ∣∣∣G(m)k

∣∣∣ < exp {n (H(X|U) + ρ1,n − λn)} . (161)

And since the union of all the bins is the codeword set of a code for the channel {U , PS|U , S},from Theorem 2 we have

∣∣∣∣∣∣⋃

m≤M

⋃

k≤Mm

G(m)k

∣∣∣∣∣∣< exp {−n (H(X|U) + ρ1,n − λn)} exp {n (I(U ; X) + ρ2,n)} . (162)

42

For x ∈ G(m)k , we have x ∈ T[X]. From Lemma 6 we finally get

PnX

⋃

m≤M

⋃

k≤Mm

G(m)k

<

∣∣∣∣∣∣⋃

m≤M

⋃

k≤Mm

G(m)k

∣∣∣∣∣∣exp {−n (H(X)− ρ3,n)} (163)

< exp {−n (λn − ρ1,n − ρ2,n − ρ3,n)} , (164)

Where ρ1,n+ρ2,n+ρ3,n = −d2δn log δn for some constant d2 > 0. So by choosing−d2δn log δn <λn < d1nδ′2n/2, for example, choosing δn = 1

n1/4 and δ′n = 1n1/16 , we can find a λn such that

PnS|X

(B

(m)k

C∣∣∣x

)< εn = (ε2,n)1/2 for x ∈ A

(m)k −G

(m)k (165)

PnX

⋃

m≤M

⋃

k≤Mm

G(m)k

< (ε2,n)1/4 (166)

for large enough n.

So for any given ε > 0, τ > 0 we have

P(EC1 ∩ EC

2 ∩ E3) =∑

m≤M

∑

k≤Mm

∑

x∈A(m)k

PnS|X

(Sn /∈ B

(m)k

∣∣∣x)Pn

X(x)

≤∑

m≤M

∑

k≤Mm

∑

x∈A(m)k −G

(m)k

PnS|X

(B

(m)k

C |x)Pn

X(x) + PnX

(⋃m

⋃

k∈Mm

G(m)k

)

≤ (ε2,n)1/2 + (ε2,n)1/4. (167)

So the probability of distortion violation is

Pdv = PE1 + P(E2 ∩ EC1 ) + P(E3 ∩ EC

2 ∩ EC1 )

< 1− η(ε1 − τ1) + τ5 + (ε2,n)1/2 + (ε2,n)1/4

= 1− ε1 + (ε2,n)1/4 + ((1− η)ε1 + ητ1 + τ5 + (ε2,n)1/2)

(168)

which can be simplified toPdv < 1− ε1 + (ε2,n)1/4 + γ. (169)

Here γ can be chosen to be arbitrarily small by making τ1, τ5 sufficiently small and η suffi-ciently large. Hence ∀ε > 0, for n large enough, we can adjust the parameter ε1 and ε2,n sothat

ε1 − (ε2,n)1/4 − γ > 1− ε, (170)

then the probability of distortion violation for this coding scheme is less than ε.

Specifically, if ε is close to 0, since ε2,n goes to 0, we are using a channel codebook for{U , PS|U , S}.. In the mean time we can choose ε1 close to 1, i.e., using a source codebookfor source X with reproduction alphabet U , to make the probability of distortion violationfor this coding scheme close to 0.

43

G Sketch of Proof of Theorem 6

Proof. We show the proof for Theorem 6 here and the proof for Theorem 7 is very similar.

We choose ε3,n = exp{−d3nδ2

[X2|U2],n

}, ε1,n = exp

{−d1nδ2

[X1|U1],n

}, where d3 < c2, d1 < c2,

c2 is the constant in equation (55).

For the first step in the greedy code construction, by reexamining Lemma 9 and the proof

of Theorem 1 we can see that as long as⋃

j≤i Ci ≥ exp{−d3nδ2

[X2|U2],n

}, we can construct

the i + 1th code with|Ci+1| ≥ exp{n(I(U2; X2)− τ)}. (171)

So when the first step of greedy algorithm stops, we have a maximal family of (n, ε3,n)-

codes(f

(i)2 , φ

(i)2

)for the DMC {U2,PX2|U2 ,X2} with disjoint codeword sets Ci, i = 1, · · · , M2

satisfying|Ci| ≥ exp{n(I(U2; X2)− τ}, i = 1, · · · , M2, (172)

and1− PU2(C) < ε3,n = exp

{−d3nδ2[X2|U2],n

}. (173)

Hence the randomized code for the DMC {U2,PX2|U2 ,X2} is a (n, ε3,n)-code. Furthermore,we can assume each Ci consists codeword of the same type from Lemma 4.

From the above discussion we can see that the encoding and decoding of the message m1

of the first user and all the probability measures involved are exactly the same as that in theGel’fand-Pinsker problem with U , X, S replaced by U1, X1, U2 respectively, the calculationof probability of error of the first user is also the same as that in Theorem 4. The problemof calculating the probability of error of second user is also quite analogous to the ones wedid in the Gel’fand-Pinsker problem, i.e. we know the code for DMC {U2,PX2|U2 ,X2} is a(n, ε3,n)-code when the measure of output space is PX2|U2 , but the measure of output spaceused in calculating Pe2 is PX2|U1U2 . We can also use the proof technique in Appendix E, withU , X, S replaced by U2, X2, U1 respectively, and the distribution on the side information isnow the uniform distribution over the set {f (m1)(ψ

(m1)1 (u2))} for a fixed u2 ∈ C.

References

[1] P. Billingsley, Probability and Measure (3rd edition), Wiley Inter. Science, 1995.

[2] T. Berger, “Multiterminal source coding,” The Information Theory Approach to Com-muniations, (CISM Courses and Lectures No. 29), G. Longo, Ed., Wien and NewYork:Springer-Verlag, 1977.

[3] G. Caire, S. Shamai and S. Verdu, “Lossless data compression with error correctingcodes,” in Proc. 2003 IEEE Int. Symp. Information Theory, Yokohama, Japan, June2003, pp. 22.

44

[4] B. Chen and G. Wornell, “The duality between information embedding and source codingwith side information and some applications,” IEEE Trans. Inform. Th., vol. 49(5), pp.1159–1180, May 2003.

[5] T. M. Cover, “Comments on Broadcast Channels,” IEEE Trans. Inform. Th., vol. 44(6),pp. 2524–2530, Oct. 1998.

[6] T. M. Cover and M. Chiang, “Duality between channel capacity and rate distortion withtwo-sided state information,” IEEE Trans. Inform. Th., vol. 48(6), pp. 1629–1638, June2002.

[7] T. M. Cover and J. A. Thomas, Elements of Information Theory, New York: Wiley,1991.

[8] I. Csiszar and J. Korner, Information Theory: Coding Theorems for Discrete MemorylessSystems, New York: Academic, 1981.

[9] K. Marton, “A coding theorem for the discrete memoryless broadcast channels,” IEEETrans. Inform. Th., vol. 25(3), pp. 306–311, May 1979.

[10] S. I. Gel’fand and M. S. Pinsker, “Coding for channel with random parameters,” Probl.Contr. and Inform. Theory, vol. 9(1), pp. 19–31, 1980.

[11] S. S. Pradhan, J. Chou, and K. Ramchandran, “Duality between source coding andchannel coding and its extension to the side information case,” IEEE Trans. Inform.Th., vol. 49(5), pp. 1181–1203, May 2003.

[12] S. S. Pradhan and K. Ramchandran, “On functional duality between MIMO source andchannel coding with one-sided collaboration,” in Proc. 2002 IEEE Information TheoryWorkshop, Bangalore, India, Oct. 2002, pp. 20–25.

[13] S. Shamai, and S. Verdu, “The empirical distribution of good codes,” IEEE Trans.Inform. Th., vol. 43(3), pp. 836–846, May 1997.

[14] C. E. Shannon, “Coding theorems for a discrete source with a fidelity criterion,” IRENat. Conv. Rec., pp. 142–163, Mar. 1959.

[15] J. K. Su, J. J. Eggers and B. Girod, “Illustration of the Duality Between ChannelCoding and Rate Distortion with Side Information,” in Proc. Asilomar Conf. Signals,Systems, Computers, Pacific Grove, CA, Nov. 2000.

[16] A. D. Wyner and J. Ziv, “The rate distortion function for source coding with sideinformation at the decoder,” IEEE Trans. Inform. Th., vol. 22, pp. 1–10, Jan. 1976.

[17] R. Zamir and A. Cohen, “The Rate Loss in Writing on Dirty Paper,” in DIMACSWorkshop on Network Information Theory, Mar. 2003, pp. 17-19.

45

Fixed Binning Schemes for Channel and Source Coding ...pramodv/pubs/wang_duality.pdfchannel and source coding problems in Figure 1 (with the encoding and decoding operations reversed

Documents