Channel Coding via Robust Optimization Part 1: The Single ...

Channel Coding via Robust OptimizationPart 1: The Single-User Channel

Chaithanya Bandi∗ Dimitris Bertsimas†

August 2015

Abstract

In this paper, we consider the optimal finite length (n) channel coding problem on a single-user Gaussian channel achieving an error probability bound of ε. We build on the ideas ofShannon [1948] and the paradigm of robust optimization (RO) to present novel optimizationformulations of the channel coding problem on a single-user Gaussian, exponential and additiveuniform noise channels. Solving these problems to optimality leads to upper and lower boundson the channel capacity for finite code length n. As n → ∞, the upper and lower boundscoincide and are equal to the Shannon capacity. The nature and computational complexityof the optimization problems depend on the noise distribution in the following way: (a) ForGaussian channels the optimization problems involve a rank minimization problem subjectto semidefinite constraints, which we solve by iteratively solving semidefinite optimizationproblems; and (b) For exponential and additive uniform noise channels, the optimizationproblems are mixed integer linear optimization problems, which we solve using commercialsolvers. Because of the size and complexity of these formulations, we do not solve them toprovable optimality. Still we provide a feasible code that leads to a valid lower bound onchannel capacity, but we only provide approximations for the upper bound. We report thesecomputations for n = 140 for Gaussian channels and n = 300 for exponential channels.

1 The Channel Coding ProblemThe central problem of communications is how to transmit information reliably through a noisy(and thus unreliable) communication channel. Shannon’s “A mathematical theory of communica-tion” published in 1948 marked the beginning of Information Theory. In this seminal paper hedeveloped a framework to precisely define the intuitive notion of information, which in turn makesit possible to mathematically capture the notions of a communication channel and its capacity.Shannon showed that (a) there is an upper bound (the so called channel capacity) for the rate ofreliable transmission of information, and (b) there exists a code that leads to a rate of transmissionarbitrarily close to the channel capacity achieving probabilities of error arbitrarily close to zero.Since then, Shannon’s approach has been used to characterize the fundamental limits of communi-cation for various kinds of channels. However, the communication limits of many common channels∗Assistant Professor, Kellogg School of Management, Northwestern University, IL 60208, USA. Email: c-

[email protected].†Boeing Professor of Operations Research, co-director, Operations Research Center, Massachusetts Institute of

Technology, E40-147, Cambridge, MA 02139, USA. Email: [email protected].

1

such as the interference and the broadcast channels still remain unknown. Indeed a general theoryfor communication limits on networks of channels is still largely open. Techniques such as randomencoding that are effective for single-user channels no longer allow us to characterize the capacityregions of complex channels.

The key difficulty in these problems is the analytical complexity of the optimization problemsinvolved, which seek to maximize the number of symbols that can be transmitted over a channelwhile achieving a bounded probability of error. The probability of error for a given code involvesan n−dimensional integral and is therefore computationally expensive, and optimization for sucha code directly is quite complex. In this paper, we consider these optimization problems and usea robust optimization (RO) framework to reformulate these problems into optimization problemsthat are amenable to be solved by state of the art solvers. Solving these problems to optimalityleads to upper and lower bounds on the channel capacity for finite code length n. As n → ∞,the upper and lower bounds coincide and are equal to the Shannon capacity. Because of thesize and complexity of these formulations, we do not solve them to provable optimality. Still weprovide a feasible code that leads to a valid lower bound on channel capacity, but we only provideapproximations for the upper bound. We report these computations for n = 140 for Gaussianchannels and n = 300 for exponential channels.

In the present paper, which represents Part I of our work, we present the key steps in ourapproach and show how we reformulate the Gaussian channel coding problem as a rank minimiza-tion problem with semidefinite optimization constraints and the exponential and additive uniformnoise channel coding problem as a mixed linear integer optimization problem. In Part II of ourwork (Bandi and Bertsimas [2015]), we address multi-user channels with interference and showthat they can be reformulated in the same manner as in the single-user channel case.We next briefly describe the philosophy as well as the key ingredients of our approach:1. Optimization formulation of Decoding Constraints: By identifying that decoding is a robust-

ness property, we formulate the probabilistic decoding constraints as robustness constraintsof an optimization formulation. For single-user Gaussian channels, it is well known that theminimum distance decoder is an optimal decoder Cover and Thomas [2006], and we obtainrobust quadratic constraints to represent the decoding property.

2. Using typical sets as uncertainty sets: We use the original idea of Shannon [1948], that forthe purpose of computing the capacity and constructing the underlying code, it suffices toconsider noise sequences that belong to the so called “typical set”. We interpret these typicalsets as uncertainty sets in a RO setting. Imposing that the decoding constraints found inStep 1, hold for all noise sequences in the typical set naturally leads to a RO formulation.

3. Using binary optimization to count probabilities: Since our objective is to solve the sameproblems, involving probabilistic primitives, that the information theory community has ad-dressed over the years, we need to be able to measure the probability of error. In order toachieve this we need to count the frequency at which errors occur, which we accomplish byusing binary optimization.

4. Using semidefinite or binary optimization to model non-convexities: We reformulate the un-derlying RO problem as either a mixed binary linear optimization problem (for non-Gaussianchannels) or a non-convex quadratic optimization problem (for Gaussian channels). For thecase of Gaussian channels, we reformulate the non-convex quadratic optimization problem asa rank minimization problem with semidefinite constraints. We then use the log-Det methoddeveloped in Fazell et al. [2003] to solve such problems as a sequence of semidefinite optimiza-tion problems.

2

1.1 Problem Definition and Notation

Throughout the paper, we denote scalar quantities by non-bold face symbols (e.g., x ∈ R, k ∈ N),vector quantities by boldface symbols (e.g., x ∈ Rn, n > 1), and matrices by uppercase boldfacesymbols (e.g., A ∈ Rn×m, n > 1,m > 1). We denote scalar random variables as z and vectorrandom variables as z. We use the notation z ∼ N (0, σ · I) to denote that each component of zis normally distributed with mean 0 and standard deviation σ. For common information theoreticobjects, we use the notation from Cover and Thomas [2006].

To make the paper self-contained, we define the notion of a communication channel and therelated channel coding problem. Users send and receive messages over a communication channel.For example, in a single-user Gaussian channel, a sender transmits signal xi ∈ Rn, but the receiverreceives

y = xi + z,

where the noise z ∼ N (0, σ · I).Given a communication channel, a sender seeks to transmit messages from a message set

M = {1, . . . ,M} by coding the messages using codewords of length n, according to a code C.The inputs of such a code C are:(a) The length n of the codewords.(b) The number M = |M| = 2nR of codewords. The quantity R = log2M/n is called the rate of

the code.(c) The power constraint P of the sender.(d) The noise standard deviation σ.(e) The average probability of error ε > 0 (see Eq. (1)) the user tolerates.

The outputs of C [n,R, P, σ, ε] are:(a) A code-book B, which is a set of M codewords xi, i = 1, . . . ,M satisfying ‖xi‖2 ≤ nP, ∀i.(b) A decoding function g : Rn → {1, 2, . . . ,M} that maps each received word y to one of the

codewords in B, while satisfying the error-tolerance of ε. That is, for each i = 1, . . . ,M , wemust have

P [g (xi + z) 6= i] ≤ ε, (1)

with z ∼ N (0, σ · I).The finite capacity region of a single-user Gaussian channel Rn [P, σ, ε] is the set of all rates R suchthat there exists a code C [n,R, P, σ, ε]. In the limit that n→∞ and ε→ 0, the capacity region ofa single-user Gaussian channel is called the asymptotic capacity region and is denoted by R [P, σ].In the next section, we present a brief review of the information theory literature organized aroundthe results on different channels. We also review RO which is the key methodology that we use inthis paper.

1.2 Relevant Literature

For a single-user communication channel, Shannon [1948] showed that there exists a maximum rateC associated with every communication channel, above which no reliable transmission is possibleand below which there exists a code achieving small error probabilities. In particular, Shannonshowed that the capacity C is given by

C = sup I(X;Y ),

3

where I (X;Y ) is the mutual information between the random variables X and Y , and the supre-mum is with respect to all possible input distributions of X. He showed that arbitrarily smallprobability of errors can be achieved by using random encoding with maximum likelihood decod-ing, whenever the rate of transmission is less than C. For the case of the Gaussian channel wherethe noise is normally distributed with mean 0 and standard deviation σ and the sender has a powerconstraint P , Shannon obtained that the asymptotic capacity of the channel is

C =1

2log

(1 +

P

σ2

)bits per channel use. (2)

For finite code length n, Polyanskiy et al. [2010] provide lower and upper bounds for the channelcapacity. For a broad review of Information Theory we refer the reader to Verdú [1998], Verdúand McLaughlin [2000], Cover and Thomas [2006].

A Review of Binary Optimization

The modern theory of linear optimization (LO) started in the 1940s when George B. Dantzigproposed the simplex algorithm (Dantzig [1947]) for solving LO problems of the form

max c′x

s.t. Ax = b,

x ≥ 0,

where c ∈ Rn, A ∈ Rm×n, b ∈ Rm are given data and x ∈ Rn is a vector of decision variables.The simplex algorithm proved practically efficient and it is, to this date, the main algorithm forsolving LO problems. Today commercial solvers such as Cplex [2014] and Gurobi [2010] routinelysolve problems with tens of millions of variables and constraints. For a review, see Bertsimas andTsitsiklis [1997].

A natural and very relevant extension of LO is the class of mixed binary optimization (MBO)problems

max c′x + d′y

s.t. Ax + By = b, (3)x ≥ 0, y ∈ {0, 1}k,

where in addition to the continuous variables x, we also have binary variables y. In the last sixdecades, significant progress has been made to solve MBO problems. Using commercial codes likeCplex [2014] and Gurobi [2010], we can routinely solve problems involving hundreds of thousandsof binary variables and millions of continuous variables and constraints. In this paper, we utilizethe ability of commercial solvers to solve large scale instances of problem (3). For a review ofMBO, see Bertsimas and Weismantel [2005].

A Review of Robust Optimization

RO is one of the fastest growing areas of optimization in the last decade. It addresses the problemof optimization under uncertainty, in which the uncertainty model is not stochastic, but ratherdeterministic and set-based. RO models are typically polynomial-time solvable, but may lead tosolutions that are too conservative. To alleviate conservatism, Ben-Tal and Nemirovski [2000,

4

1998, 1999], and El-Ghaoui and Lebret [1997], El-Ghaoui et al. [1998], proposed linear optimiza-tion models with ellipsoidal uncertainty sets, whose robust counterparts correspond to quadraticoptimization problems. Bertsimas and Sim [2003, 2004] proposed RO models with polyhedraluncertainty sets that can model linear/integer variables, whose robust counterparts correspond tolinear/integer optimization models. For a more thorough review we refer the reader to Bertsimaset al. [2011], Ben-Tal et al. [2009].

1.3 Contributions and Structure

The present paper is part of a broader research effort (Bandi and Bertsimas [2012]) to investigatea RO approach to classical problems with probabilistic primitives (like information theory). In ourapproach we replace the probabilistic primitives with uncertainty set based primitives and interprettheir key properties as robustness properties. Our aim is to show that the resulting performanceanalysis or optimization questions become computationally tractable. We have implemented thisprogram to single-class queueing networks in Bandi et al. [2015], pricing of multi-dimensionaloptions in Bandi and Bertsimas [2014b], and in the problem of mechanism design Bandi andBertsimas [2014a]. In the present paper, we consider the channel coding problem and present arobust optimization approach to this problem.

We begin by providing algorithms to compute the capacity region and to find optimal codesfor the single-user Gaussian channel. We first present a RO approach to this channel that recoversthe known asymptotic capacity and finds lower bounds for finite code length by solving a rankminimization subject to semidefinite constraints. We report computational results that show thatthe RO approach is computationally tractable for n = 140 for single-user Gaussian channels,n = 300 for single-user additive exponential noise channels. In Part II of this work, we extend thisapproach to the case of the two-user Gaussian interference channel, the two-user multi-access andbroadcast Gaussian channels, and multi-user channels with exponentially distributed noise.

The structure of the paper is as follows. In Section 2, we present how decoding is a robustnessproperty and the connection of typical sets and uncertainty sets. In Section 3, we introduce theRO approach for the single-user Gaussian channel. In Section 4, we examine how the resultingoptimization problems depend on the nature of the probabilistic primitives and present mixedbinary linear optimization problems to compute lower and upper bounds for the capacity region ofsingle-user channels with exponentially distributed noise. In Section 5, we present some concludingremarks.

2 Robustness and Information TheoryIn this section, we discuss how a robustness perspective can shed new light to information theoryby interpreting (a) decoding a robustness property and (b) typical sets as uncertainty sets. Weconsider a single-user channel in which the noise is distributed according a probability densityfunction (pdf) f(·).

2.1 Decoding as a Robustness Property

The Maximum Likelihood (ML) decoder is an optimal decoder for any single-user channel (seeCover and Thomas [2006]), that is, there always exists an optimal code which uses ML as the

5

decoding function. An ML decoder is characterized by the decoding function gML (·) given by

gML (y) = arg maxxi∈B

P [y|xi was sent] = arg maxxi∈B

n∏j=1

f (yj − xij) .

This allows us to formulate the coding problem as an optimization problem by restricting ourattention to codes that are optimal with respect to the ML decoder. In particular, we constrainthe codewords to satisfy the constraints

n∏j=1

f (xij + zj − xi′j) ≤n∏j=1

f (zj) , ∀i, i′ 6= i, ∀z ∈ Ui, (4)

where Ui is an appropriately chosen uncertainty set (see the discussion in Section 2.2) such that

P [z ∈ Ui] ≥ 1− ε., (5)

and Ui could depend on the xi.Note that Eqs. (4) are expressed in the language of robust optimization, which lead to RobustOptimization problems.

2.2 Typical Sets as Uncertainty Sets

Given a pdf f(·), Shannon [1948] in his study of the asymptotic capacity region of a communicationchannel, introduced the notion of a typical set Ufε in order to capture the following two properties:(a) P

[z ∈ Ufε

]= 1− ε, with ε→ 0 as n→∞.

(b) The conditional pdf h(z) = f(z∣∣∣z ∈ Ufε ) satisfies:∣∣∣∣ 1n log h(z) +Hf

∣∣∣∣ ≤ φ(ε),

for some Hf (the differential entropy of the pdf f(·)) and φ(ε) → 0, as n → ∞, wherez = [z1, . . . , zn] and zi ∼ f (·) , ∀i = 1, . . . , n.

Property (a) means that the typical set has probability nearly one, while Property (b) means thatall elements of the typical set are nearly equiprobable, see Cover and Thomas [2006]. We nextshow that the typical set of a pdf f (·) is given by

Ufε =

z

∣∣∣∣∣∣∣∣∣∣−Γfε ≤

n∑i=1

log f (zi) + nHf

σf√n

≤ Γfε .

, (6)

whereHf = −

ˆ ∞−∞

f (x) log f (x) dx, σ2f =

ˆ ∞−∞

f (x) (log f (x) +Hf )2 dx,

and Γfε is chosen such that

P

[∣∣∣∣ n∑i=1

log f (zi) + nHf

∣∣∣∣ ≤ Γfε · σf√n

]= 1− ε. (7)

6

dbertsim

Sticky Note

U_i needs to be exactly defined. It is vague as is now.

dbertsim

Sticky Note

Does ML lead to an optimal code? The AE seems to suggest no.

Proposition 1. For a pdf f(·), Ufε defined in Eq. (6) satisfies(a) P

[z /∈ Ufε

]≤ ε.

(b) The conditional pdf h(z) = f(z∣∣∣z ∈ Ufε ) satisfies:

∣∣ 1n

log h(z) +Hf

∣∣ ≤ φ(ε), with φ(ε) → 0,as n→∞.

The proof is ommitted as it is elementary.The typical sets for the normal, exponential, uniform and binary distributions, are presented below.

Corollary 1. [Typical Sets for Normal, exponential, uniform and Binary Distributions](a) The typical set for normally distributed i.i.d. random variables zi ∼ N(0, σ) is given by

UGε =

{z∣∣ −ΓG

ε ≤ ‖z‖2 − nσ2 ≤ ΓGε

}, (8)

(b) The typical set for correlated normally distributed random variables z ∼ N(0,Σ) is given by

UCGε =

{z∣∣ −ΓCG

ε ≤ ‖Σ−1z‖2 − n ≤ ΓCGε

}, (9)

(c) The typical set for exponentially distributed i.i.d. random variables zi ∼ Exp(λ) is given by

UEε =

{z

∣∣∣∣∣nλ −√n

λ· ΓE

ε ≤n∑j=1

zj ≤n

λ+

√n

λ· ΓE

ε , z ≥ 0

}, (10)

(d) The typical set for uniformly distributed i.i.d. random variables zi ∼ U [a, b] is given by

UUε =

z

∣∣∣∣∣∣∣na+ b

2− ΓUε

√n ≤

n∑j=1

zj ≤ na+ b

2+ ΓUε

√n,

a ≤ zj ≤ b, j = 1, . . . , n,

, (11)

(e) The typical set for binary i.i.d. random variables zi ∼ Bin(p) is given by

UBε =

z

∣∣∣∣∣∣∣np− ΓBε

√n ≤

n∑j=1

zj ≤ np+ ΓBε√n,

zj ∈ {0, 1} , j = 1, . . . , n,

, (12)

where ΓGε , ΓCG

ε , ΓEε , ΓUε , ΓBε are chosen such that

P[UGε

]= P

[UCGε

]= P

[UEε]

= P[UUε]

= P[UBε]

= 1− ε. (13)

single-

3 The Single-User Gaussian ChannelWe consider a discrete time memoryless additive Gaussian channel, in which a single-user transmitsa codeword xi from a codebook B. This codeword is transformed by the channel into y ∈ Rn

according toy = xi + zG,

where zG ∼ N (0, σ · I). The codewords are subject to an average power constraint P , that is, forany codeword xi ∈ B we require that

‖xi‖2 ≤ nP.

Suppose B consists of M codewords of length n andM = {1, . . . ,M} . In what follows, we assumethe code length is n unless otherwise mentioned, and we let zG ∼ N (0, σ · I) be the n-dimensionalvector of i.i.d Normal random variables.

7

3.1 An Optimization Formulation of the Coding Problem

We begin by observing that the maximum likelihood decoder for a single user Gaussian channelreduces to the minimum distance decoder given by

g0 (y) = arg mini∈M‖y − xi‖ .

Using this decoder, we would want the codewords to satisfy the constraint

‖xi + z− xi′‖ ≥ ‖z‖ ∀z ∈ U iε ,∀i, i′ 6= i, (14)

where U iε is a set of noise vectors with probability mass of 1− ε. These constraints ensure that, ifxi was transmitted by the user and was received as

y = xi + z for some z ∈ U iε ,then the distance between the received codeword y and any other codeword xi′ is greater than thedistance between y and xi. Note that (14) is naturally expressed as a robustness property. Thechannel coding problem is, thus, given by

max |M| (15)s.t. ‖xi + z− xi′‖ ≥ ‖z‖ , ∀z ∈ U iε ,∀i, i′ ∈M, i′ 6= i,

‖xi‖2 ≤ nP, ∀i ∈M.

We will next present the final optimization formulation, and show how to model the decodingconstraint in Eq. (14) with constraints in Eqs. (23) and (24), which uses the typical set for theGaussian distribution given by Eq. (8). We begin by introducing a parameter ν that regulates thetradeoff between the accuracy of the computation of the finite capacity of the single-user Gaussianchannel and the complexity of computing it; see also the discussion after Theorem 1.Given inputs n,R, P, σ, ε, ν, we compute the following quantities:1. The parameter γε, which we choose so that

P [‖zG‖ ≤ γε] ≥ 1− ε. (16)

2. The parameter T given by

T =

(1 + ν

ζν· γε√

n

)n, with ζ =

σ√n· Φ−1 (1− ε (1− δ (ν, n))) , (17)

where Φ (·) is the cdf of a standard normal and

δ (ν, n) = exp

(−n · r − log (1 + r)

2 (1 + 3ν)2 σ2

), with r = σ2

((1 + 3ν)2 − (1 + 2ν)2

),

and let T = {1, . . . , T};3. The parameter M0 given by

M0 = (1 + ν) · γε. (18)

4. The set of vectorsZ = {z1, z2, . . . , zT} (19)

with ‖zt‖ = M0, t = 1, . . . , T, that are the deterministic equivalent of being uniformlydistributed on the surface of n–dimensional sphere of radius M0. The construction of suchvectors has been studied under the umbrella of rate distortion theory (see Wyner [1967], Grayand Neuhoff [1998]). In Appendix B, we present an algorithm due to Lloyd [1982] to computethese vectors.

8

dbertsim

Sticky Note

Give an exact and precise refernce for U^i_{\epsolin} Vague as is.

dbertsim

Sticky Note

What does that mean?

dbertsim

Sticky Note

We present an algorithm not a formulation .What is stated is not natural. Perhaps say: We next present our overall approach and show ....

dbertsim

Sticky Note

z_G is not defined.

Code Construction

We next present an algorithm to construct codewords {xi}i∈M for the sender. Using the minimumdistance decoder, we want to ensure that the average probability of error is at most ε, that is,

1

M

∑i∈M

P [g (y) 6= i |m = i ] ≤ ε. (20)

In order to achieve this, we define the following “counting” variables {vit}i∈M,t∈T :

vit =

{1, if ‖xi + zt − xi′‖ ≥ ‖zt‖ , ∀i′ ∈M,

0, otherwise.(21)

The Encoding Algorithm is, thus, given by the feasibility problem:

‖xi‖2 ≤ nP, ∀i ∈M, (22)

‖xi − xk + zt‖+ (1− vit)M0 ≥ ‖zt‖ , ∀t ∈ T , ∀i, k ∈M, k 6= i, (23)

T∑t=1

vit ≥ (1− ε)T, ∀i ∈M, (24)

‖xi − xk‖ ≥ 2ζ√n, ∀i, k ∈M, k 6= i, (25)

vit ∈ {0, 1}, ∀i ∈M, t ∈ T , (26)

We next explain each of the constraints in the Encoding Algorithm:1. The constraints (22) impose power constraints on the codewords.2. The constraints (23) implement the decoding rule for noise vector zt.3. The constraints (24) imposes that the decoding constraint needs to be valid for at least (1−ε)T

of the z′ts. In other words, a fraction ε · T of noise vectors zt’s are not constrained to satisfythe minimum distance property.

4. The constraints (25) ensures that the codewords are separated by a certain minimum distanceto obtain a decoding error probability of at most ε.

Semidefinite Programming Reformulation

We next reformulate the feasibility problem (22)-(26) as a semidefinite optimization problem withrank constraints. We do this in two steps: (a) reformulate constraints (22)-(26) into quadratic con-straints by using the Proposition 2, and (b) reformulate quadratic constraints into SDO constraintsby using Proposition 3.

Proposition 2.(a) Constraint (23) is equivalent to the constraint

‖xi − xk‖2 +M20 (1− vit) ≥ 2 (xk − xi)

′ zt,∀t ∈ T , ∀i, k ∈M, k 6= i.

(b) Constraint vit ∈ {0, 1} is equivalent to the constraint v2it = vit.

9

Proof. (a) When vit = 1,

‖xi − xk + zt‖+ (1− vit)M0 ≥ ‖zt‖⇐⇒ ‖xi − xk + zt‖2 ≥ ‖zt‖2

⇐⇒ ‖xi − xk‖2 ≥ 2 (xk − xi)′ zt.

On the other hand, when vit = 0,

‖xi − xk + zt‖+ (1− vit)M0 ≥ ‖zt‖⇐⇒ ‖xi − xk + zt‖ ≥ ‖zt‖ −M0

⇐⇒ ‖xi − xk + zt‖2 ≥ ‖zt‖2 +M20 − 2M0 ‖zt‖

⇐⇒ ‖xi − xk‖2 +M20 ≥ 2 (xk − xi)

′ zt,

where the last equivalence follows from ‖zt‖ = M0. Therefore,

{‖xi − xk + zt‖+ (1− vit)M0 ≥ ‖zt‖} ⇐⇒{‖xi − xk‖2 +M2

0 (1− vit) ≥ 2 (xk − xi)′ zt}.

(b) We have that vit ∈ {0, 1} if and only if v2it = vit.

Using Proposition 2, we convert the feasibility problem (22)-(26) to a non-convex quadratic opti-mization problem. Let K = {1, . . . , K}.

Proposition 3. The set of quadratic, possibly non-convex, constraints

fk(y) = y′Aky + 2b′ky + ck ≤ 0, ∀k ∈ K. (27)

is equivalent to the semidefinite optimization problem

Ak •Y ≤ 0, ∀k ∈ K,Y11 = 1, Y � 0, rank (Y) = 1,

(28)

whereY =

(1y

)(1,y′) , Ak =

(ck bkbk Ak

).

Proof. The quadratic function fk (·) can be written as

fk(y) = (1,y′)

(ck bkbk Ak

)(1y

)= Ak •Y,

where Y =

(1y

)(1,y′) , and Ak =

(ck bkbk Ak

). Clearly,

Y11 = 1, Y � 0, and rank (Y) = 1.

In addition, Ak •Y = fk(y) ≤ 0, ∀k ∈ K.On the other hand, given a feasible solution Y to (28), because rank(Y) = 1 and Y11 = 1, there

exists a vector y such that

Y =

(1y

)(1,y′) ,

and clearly y is feasible to (27).

10

Using Propositions 2 and 3 we next show that the feasibility problem (22)-(26) is equivalent tochecking whether the optimal solution value of the semidefinite optimization problem (29) is equalto one.

min rank (Y) (29)

s.t. Ai •Y ≤ 0, ∀i ∈M,

Bikt •Y ≤ 0, ∀t ∈ T , ∀i, k ∈M, k 6= i,

Ci •Y ≤ 0, ∀i ∈M,

Eik •Y ≤ 0, ∀i, k 6= i ∈M,

Dit •Y = 0, ∀i ∈M, t ∈ T ,Y � 0,

where

Ai (p, q) =

−nP, if p = 1, q = 1,

0, if p = 1, q > 1,

0, if p > 1, q = 1,

1, if ∀p = q = (i− 1)n+ 1, . . . , in+ 1,

0, otherwise.

Ci (p, q) =

(1− ε)T, if p = 1, q = 1,

−1, if ∀p = q = n2 + 1 + (i− 1)T + t, ∀t = 1, . . . , T,

0, otherwise.

Dit (p, q) =

0, if p = 1, q = 1,

−12, if p = 1, q = n2 + 1 + (i− 1)T + t,

−12, if q = 1, p = n2 + 1 + (i− 1)T + t,

1, if ∀p = q = n2 + 1 + (i− 1)T + t,

0, otherwise.

Eik (p, q) =

4nζ2, if p = 1, q = 1,

0, if p = 1, q > 1,

0, if p > 1, q = 1,

1, if ∀q > 1, p = (i− 1)n+ 1, . . . , in+ 1,

−1, if ∀p > 1, q = (k − 1)n+ 1, . . . , kn+ 1,

0, otherwise.

11

Bikt (p, q) =

−M20 , if p = 1, q = 1,

ztr, if p = 1, q = (k − 1)n+ 1 + r, ∀r = 1, . . . , n,

ztr, if q = 1, p = (k − 1)n+ 1 + r, ∀r = 1, . . . , n,

−ztr, if p = 1, q = (i− 1)n+ 1 + r, ∀r = 1, . . . , n,

−ztr, if q = 1, p = (i− 1)n+ 1 + r, ∀r = 1, . . . , n,

−1, if ∀p = q = (i− 1)n+ 1, . . . , in+ 1,

−1, if ∀p = q = (k − 1)n+ 1, . . . , kn+ 1,

1, if q = (k − 1)n+ 1 + r, p = (i− 1)n+ 1 + r, ∀r = 1, . . . , n,

1, if q = (i− 1)n+ 1 + r, p = (k − 1)n+ 1 + r, ∀r = 1, . . . , n,M2

0

2if p = 1, q = n2 + 1 + (i− 1)T + t,

M20

2if q = 1, p = n2 + 1 + (i− 1)T + t,

0, otherwise.

We summarize the discussion by presenting the following algorithm which checks if a given rateR and an error probability ε are achievable by a sender with power P on a channel with noisevariance σ using codewords of length n. As discussed before, the accuracy of the algorithm isbased on a parameter ν.

Algorithm 1. Achievable rates on a Single-User Gaussian ChannelInput: Code parameters n,R, ε; Channel parameters P, σ; and accuracy parameter ν.Output: Codewords {xi}i∈M , if the rate R is achievable.Algorithm:1. Solve the rank minimization semidefinite optimization problem (29) to compute r∗, codewords{xi}i∈M, and auxiliary binary variables {vit}t∈T .

2. When r∗ = 1, then declare that the rate R is achievable using the codebook B = {xi}i∈M andthe minimum distance decoding function, achieving a decoding error probability of 2ε.

3. If r∗ ≥ 2, then declare that the rate R cannot be achieved on a single-user Gaussian channelwith noise standard deviation (1 + 3ν)σ with probability of error less than or equal to ε(1 −δ(ν, n)), where

δ (ν, n) = exp

(−n · r − log (1 + r)

2 (1 + 3ν)2 σ2

), with r = σ2

((1 + 3ν)2 − (1 + 2ν)2

).

When a particular value of R leads to r∗ = 1, R represents a lower bound for the channel capacity.We also obtain a code that achieves the rate R. When a particular value of R leads to r∗ ≥ 2, Rrepresents an upper bound for the channel capacity. In Theorem 1, we show how we use Algorithm1 to deduce lower and upper bounds on the capacity of a single-user Gaussian channel.

We next present the main result of this paper.

Theorem 1. Let Rn [P, σ, ε] is the set of all achievable rates on a single-user Gaussian channel withpower P , standard deviation of the noise σ and maximum decoding error probability of ε. Considerthe optimization problem (29) constructed for parameters n,R, P, σ, ε and let r∗ and {xi}i∈M be anoptimal solution, then

12

(a) If r∗ = 1, then R ∈ Rn [P, σ, 2ε], that is the rate R is achievable using the codebook B ={xi}i∈M and the minimum distance decoding function, achieving a maximum decoding errorprobability of 2ε.

(b) If r∗ ≥ 2, then R /∈ Rn [P, (1 + 3ν)σ, ε = ε(1− δ(ν, n))] , which means that the rate R cannotbe achieved on a channel with a noise variance (1 + 3ν)σ, where

δ (ν, n) = exp

(−n · r − log (1 + r)

2 (1 + 3ν)2 σ2

), with r = σ2

((1 + 3ν)2 − (1 + 2ν)2

).

DiscussionTheorem 1(a) indicates that for values of (n, ε/2, P, σ, ν, R), if r∗ = 1, then the rate R is achievable,and thus such an R provides an lower bound of the capacity Rn [P, σ, ε].Theorem 1(b) indicates that for values (n, ε/(1 − δ(ν, n)), P, σ/(1 + 3ν), ν, R), if r∗ ≥ 2, then therate R is not achievable, and thus such an R provides an upper bound on the capacity Rn [P, σ, ε].In this way, Algorithm 1 provides upper and lower bounds for the capacity of a single-user Gaussianchannel for finite n. In the limit of ν, ε→ 0 and n→∞ the lower and upper bounds are tight. So,in principle our approach provides valid upper and lower bounds. In numerical implementations,however, we do not solve problem (29) to provable optimality due to its size and complexity. Whenwe find r∗ = 1, we still provide a valid lower bound on channel capacity, but when we report r∗ ≥ 2,we do not have a guarantee as we have not solved problem (29) to provable optimality. In thisway, the upper bound we report can only be seen as an approximation.

3.2 Proof of Theorem 1

We present the proof of Theorem 1 in this section. Before we proceed, we establish the followingnotation for this section. We let Ei [z] denote the event that a decoding error occurs when messagei is sent on the channel and noise vector z is realized, that is,

Ei [z] = {∃k 6= i : ‖xi − xk + z‖ ≤ ‖z‖} .

Let 1 {Ei [z]} denote the indicator random variable corresponding to Ei [z] . Furthermore, let Sn(r)be the n–dimensional shell of radius r given by

Sn(r) ={

s ∈ Rn∣∣∣‖s‖ = r

}, (30)

and let s(r) denote a vector chosen uniformly at random in Sn(r). We next review the followingtwo results from the literature which will be integral to the proof of correctness of Algorithm 1.Wyner [1967] showed that a large collection of uniformly random points on a sphere can be usedas a good approximation for all the points on the sphere, as given in Proposition 4.

Proposition 4 (Wyner [1967]). Let A = {a1, a2, . . . , aN} be a Voronoi tessellation on Sn(γ).Then, ∀s ∈ Sn(γ), ∃ai ∈ A such that ‖s− ai‖ ≤ γ/N1/n.

The following result lists two important properties of a vector of independent normal randomvariables.

Proposition 5 (Cover and Thomas [2006]). Let zG ∼ N (0, σ · I).

13

(a) (Bernstein’s inequality) The vectors zG are concentrated in a thin shell of radius σ√n, that

is,

P[

1

n‖zG‖2 > σ2 − r

]≥ 1− exp

(−n · r − log (1 + r)

2σ2

).

(b) (Spherical symmetry) The random vector u = zG/ ‖zG‖ is distributed uniformly in Sn(1).(c) Let d be a random variable distributed identically to the norm of zG, that is, d ∼ ‖zG‖. Then,

zG ∼ d · sn(1), where s(1) denote a vector chosen uniformly at random in Sn(1).

We next present a series of propositions that form the components of the proof of Theorem 1.In Proposition 6, we present some of the geometric properties we need.

Proposition 6. Let rA, rB, γ > 0. Let A, B and C be three single-user channels with noisevectors uA, uB, zC where uA, uB are distributed uniformly in Sn(rA) and Sn(rB), respectively, andzC ∼ N (0, γ · e) . Let {xi}i∈M be the set of codewords. Then,

(a) If rA ≥ rB, then P[∃k 6= i : ‖xi − xk + uA‖ ≤ rA

]≥ P

[∃k 6= i : ‖xi − xk + uB‖ ≤ rB

];

(b) P[∃k 6= i : ‖xi − xk + uA‖ ≤ rA

]≥ P

[∃k 6= i : ‖xi − xk + zC‖ ≤ ‖zC‖

∣∣∣‖zC‖ ≤ rA

];

(c) P[∃k 6= i : ‖xi − xk + uA‖ ≤ rA

]≤ P

[∃k 6= i : ‖xi − xk + zC‖ ≤ ‖zC‖

∣∣∣‖zC‖ > rA

].

In Proposition 7, we show that a code that is “good” with respect to Gaussian noise is also “good”for uniform noise.

Proposition 7. Let C be a code with codebook B and a minimum distance decoding function.Consider two noises zG ∼ N (0, σ · I) and zU uniformly distributed on Sn(σ

√n) with σ < σ. If

P [Ei [zG]] ≤ ε, then

P [Ei [zU ]] ≤ ε

1− exp (−nβ), with β =

σ2 − σ2 − log (1 + σ2 − σ2)

2σ2·

We next examine certain properties of optimal solutions of (29). Let W be the set of all scalednoise vectors defined by W =

{wt|wt = 1

1+νzt, ∀t ∈ T

}, and τ (s) = arg min

t∈T‖s−wt‖ .

Proposition 8. Let ε > 0 and γε be as in (16). Let {r∗,xi, vit} be an optimal solution of (29). Ifr∗ = 1, then for s uniformly distributed on Sn(γε), we have(a) P [∀k 6= i : ‖xi − xk + s‖ ≥ γε] ≥ P

[viτ(s) = 1

],

(b) P[viτ(s) = 1

]≥ 1− ε.

All the proofs are presented in Appendix A. We next present the proof of Theorem 1.

Proof of Theorem 1.In Part (a), we have r∗ = 1 and therefore, we compute the codewords {xi} that we then transmiton the channel. We then show that the probability of error when these codewords are transmittedon a single-user Gaussian channel with noise standard deviation σ is bounded by 2ε. In Part (b),when r∗ = 2, we cannot compute feasible codewords {xi}, and we then show that rate R cannotbe achieved on a channel with noise standard deviation (1 + 3ν)σ achieving an error probability ε.That is we show that if the rate R could be achieved on a channel with noise standard deviation(1 + 3ν)σ achieving an error probability ε, then it would have been accepted by Algorithm 1.

14

We begin by a proof of Part (a).(a) Let zG be an n-dimensional Gaussian noise with standard deviation σ, that is, zG ∼ N (0, σ · I),and let fG(·) be the probability density function defined as

fG(x) =dP [‖zG‖ ≤ x]

dx.

We next calculate the probability of error when a codeword i is sent on the channel:

P [Ei [zG]] =P [∃k 6= i : ‖xi − xk + zG‖ ≤ ‖zG‖]

=

ˆ γε

0

P [∃k 6= i : ‖xi − xk + zG‖ ≤ ‖zG‖ | ‖zG‖ = c ] · fG(c) dc

+

ˆ ∞γε

P [∃k 6= i : ‖xi − xk + zG‖ ≤ ‖zG‖ | ‖zG‖ = c ] · fG(c) dc.

We bound the second term as follows :ˆ ∞γε


≤ˆ ∞γε

fG(c) dc = P [zG /∈ Sn(γε)] ≤ ε, (31)

which follows from the definition of γε in (16).We bound the first term as follows:ˆ γε

0


=

ˆ γε

0

P [∃k 6= i : ‖xi − xk + zG‖ ≤ c | ‖zG‖ = c ] · fG(c) dc

=

ˆ γε

0

P [∃k 6= i : ‖xi − xk + s (c)‖ ≤ c] · fG(c) dc,

which follows from Prop. 5(c) by observing that conditioned on ‖zG‖ = c, the distribution of zGis identical to s (c) which is a vector uniformly distributed on a n−ball of radius c. We next havethatˆ γε

0

P [∃k 6= i : ‖xi − xk + s (c)‖ ≤ c] · fG(c) dc

≤ˆ γε

0

P [∃k 6= i : ‖xi − xk + s (γε)‖ ≤ γε] · fG(c) dc (from Prop. 6(a) because c ≤ γε)

≤ˆ γε

0

(1− P

[viτs(γε) = 1

])· fG(c) dc (from Prop. 8(a))

≤ˆ γε

0

(1− (1− ε)) · fG(c) dc (from Prop. 8(b))

≤ ε ·ˆ γε

0

fG(c) dc ≤ ε.

(32)

15

From (31) and (32), we have P [Ei [zG]] ≤ 2ε, and the Part (a) of theorem follows.(b) To prove Part(b), we prove its contra-positive, that is, we show that if we choose a rateR ∈ Rn (P, (1 + 3ν)σ, ε), and then there exists a code that is feasible to Constraints (22)-(26).

Consider a rate R such that R ∈ Rn (P, (1 + 3ν)σ, ε), then by definition, there must exist acode C = {xi}i∈M that has an error-probability of ε on the channel with Gaussian noise fG ∼N (0, (1 + 3ν)σ · I), using codewords {xi}i∈M, satisfying

P[Ei[fG

]]= P

[{∃k 6= i :

∥∥∥xi − xk + fG

∥∥∥ ≤ ∥∥∥fG∥∥∥}] ≤ ε.

We begin by showing that the codewords {xi}i∈M satisfy constraints (25). To show that, considerthe probability of incorrectly decoding xi as xk on this channel. We have

P [xi decoded as xk] ≥ P[∥∥∥xi + fG − xk

∥∥∥ ≤ ∥∥∥fG∥∥∥]= P

[2⟨xk − xi , fG

⟩≥ ‖xi − xk‖2

]= P

⟨xk − xi , fG

⟩(1 + 3ν)σ ‖xi − xk‖

≥ ‖xi − xk‖2 (1 + 3ν)σ

= 1− Φ

(‖xi − xk‖

2 (1 + 3ν)σ

).

Since we know that P [xi decoded as xk] ≤ P[Ei[fG

]]≤ ε, we have

ε ≥ 1− Φ

(‖xi − xk‖

2 (1 + 3ν)σ

)=⇒ ‖xi − xk‖ ≥ 2 (1 + 3ν)σΦ−1 (1− ε) = 2 (1 + 3ν) ζ

√n, (33)

which implies that the codewords satify the constraints (25).We next show that the codewords {xi}i∈M satisfy Eqs. (23) and (24). In particular, we need

to show that the number of zt that lead to a decoding error is bounded by ε, that is, we have toshow that

1

T

T∑t=1

1 {‖xi − xk + zt‖ ≤ ‖zt‖} ≤ ε,

where 1 {‖xi − xk + zt‖ ≤ ‖zt‖} denotes that a decoding error occured at noise vector zt.Note that 1

T

∑Tt=1 1 {‖xi − xk + zt‖ ≤ ‖zt‖} is the error probability when the noise is dis-

tributed uniformly in the set {zt}Tt=1. Noting this, we choose to analyze the error probabilityof this code under a channel in which the noise is uniformly distributed in Sn(σ1

√n), where

σ1 = (1 + 2ν)σ. Let us call it the noise zU . We will next show that

1

T

T∑t=1

1 {‖xi − xk + zt‖ ≤ ‖zt‖} ≤ P [Ei [zU ]] ,

which helps us bound the quantity 1T

∑Tt=1 1 {‖xi − xk + zt‖ ≤ ‖zt‖} .

16

In order to show this, consider any zt, and let ut = σ1zt/ (1 + ν)σ. We have ‖ut‖ = σ1√n. In the

first step, we show that

if ‖xi − xk + zt‖ ≤ ‖zt‖ , then ‖xi − xk + s‖ ≤ ‖s‖ ∀s ∈ V (ut) .

Let s ∈ Sn(σ1√n) such that s is in the Voronoi region V (ut) of ut, that is, s ∈ V (ut). Applying

Proposition 4 to A = {u1, . . . ,uT} , N = T, Λ = σ1, we obtain

‖s− ut‖ ≤ θ′√n, (34)

where

θ′ =σ1T 1/n

=σ1√n

γε· γε√

nT 1/n=σ1√n

γε· ζ · ν

1 + ν

≤ σ1√n

γε· 1

1 + 2ν· 1

2√n· ‖xi − xk‖ ·

ν

1 + ν(from (33))

=(1 + 2ν)σ

√n

γε· 1

1 + 2ν· ν

1 + ν· 1

2√n· ‖xi − xk‖ ≤

ν

1 + ν· 1

2√n· ‖xi − xk‖ . (35)

Finally, if ‖xi − xk + zt‖ ≤ ‖zt‖, then

2 〈xk − xi , s〉 = 2 〈xk − xi, (s− ut) + ut〉≥ 2 〈xk − xi , ut〉 − 2 · ‖xi − xk‖ · ‖s− ut‖ (Cauchy-Schwartz)

≥ 1 + 2ν

1 + ν‖xi − xk‖2 − 2 · ‖xi − xk‖ · θ′

√n (from (34))

= ‖xi − xk‖ ·{

1 + 2ν

1 + ν‖xi − xk‖ − 2θ′

√n

}≥ ‖xi − xk‖ ·

{1 + 2ν

1 + ν‖xi − xk‖ −

ν

1 + ν· ‖xi − xk‖

}(from (35))

= ‖xi − xk‖2 ,

which is equivalent to ‖xi − xk + s‖ ≤ ‖s‖ ∀s ∈ V (ut) . Therefore,

if ‖xi − xk + zt‖ ≤ ‖zt‖ , then ‖xi − xk + s‖ ≤ ‖s‖ ∀s ∈ V (ut) . (36)

Furthermore,

P [Ei [zU ]] =T∑t=1

P[Ei [zU ]

∣∣∣zU ∈ V (ut)]· P [zU ∈ V (ut)] . (37)

Since the set of vectors {zt} form a Voronoi tessellation, the vectors {ut} also form a Voronoitessellation on Sn (σ1

√n). Therefore, the Voronoi regions of the points ut are identical with the

same area. Consequently,

P [zU ∈ V (ut)] =1

T, ∀t = 1, . . . , T. (38)

Moreover, from (36), we have that there exists a k such that

P[Ei [zU ]

∣∣∣zU ∈ V (ut)]≥ 1 {‖xi − xk + zt‖ ≤ ‖zt‖} . (39)

17

Substituting (38) and (39) in (37), we have

P [Ei [zU ]] ≥ 1

T

T∑t=1

1 {‖xi − xk + zt‖ ≤ ‖zt‖} . (40)

From Proposition 7 we have,

P [Ei [zU ]] ≤ P [Ei [zG]]

1− δ (ν, n)≤ ε

1− δ (ν, n)= ε. (41)

Therefore, from (40), we have

1

T

T∑t=1

1 {‖xi − xk + zt‖ ≤ ‖zt‖} ≤ P [Ei [zU ]] ≤ ε,

which implies that the codewords satisfy constraints (24). Therefore, we have shown that thecodewords satisfy all the constraints of the optimization problem, implying that r∗ = 1. Thisproves the contra-positive statement of Part(b) thus proving it.

Recall that the asymptotic capacity region R [P, σ] is defined as

R [P, σ] = limn→∞

Rn [P, σ, εn] , where εn → 0, as n→∞.

Theorem 1 provides bounds on the channel capacity and a code that matches the lower bound forfinite n and ε, as well as explicit bounds for the error probabilities given n and ε. In the limitn→∞, ε→ 0 the lower and upper bounds become tight.

From a computational point of view, we need to solve large scale NP-hard problems to findlower and upper bounds for the channel capacity. However, we report computational evidence inSection 7 that suggests we can solve problems with n = 140.

4 Channels with Additive Non-Gaussian NoiseIn this section, we explore how the nature of the optimization problem we solve to compute thecapacity region and to find a matching code depends on the specific probabilistic assumptions wemake on the noise of the channel. In previous sections, we have seen that if the noise is Gaus-sian, the underlying optimization problem becomes a rank minimization problem with semidefiniteconstraints. In Section 6.1, we show that when the noise is exponentially distributed, then theunderlying capacity computation problem for single-user channels is a mixed binary linear opti-mization problem. In Section 6.2, we explore the cases of uniform, binary symmetric noise and thecase where we make no specific probabilistic assumption on the noise, but assume that the noisesequences satisfy certain limit laws.

4.1 Single-User Channel with Additive Exponentially Distributed Noise

We consider both the single-user channel when the noise is exponentially distributed. We begin byconsidering a single-user channel, where we intend to construct a code C consisting of a codebookB = {x1,x2, . . . ,xM} of sizeM . We next present the maximum likelihood decoder for the additiveexponential noise channel.

18

Proposition 9. Consider a code C for a single-user additive exponential noise channel with code-book B = {x1,x2, . . . ,xM}. The maximum likelihood decoder is given by gE0 (y) = arg mini∈B(y)

∑nj=1 (yj − xij) ,

where B (y) ={i ∈ B

∣∣∣yj ≥ xij, ∀j = 1, . . . , n}.

We next present the optimization problems that we use to characterize the capacity regions of theadditive exponential noise channels. In this direction, we observe from Corollary 1(b) that thetypical set for an exponential distribution with parameter λ is given by

UEε =

{(z1, . . . , zn)

∣∣∣∣∣nλ −√nΓEελ≤

n∑j=1

zj ≤n

λ+√n

ΓEελ

},

where ΓEε is chosen such that P[z ∈ UEε

]= 1 − ε, when each component of z is distributed

exponentially with parameter λ.Motivated by the decoder, we next present a RO problem (42) that allows us to characterize

the capacity region of a single-user additive exponential noise channel. In particular, given inputsn,R, λ, P, ε, ν, we calculate the “derivative” quantities γEε , TE and ME

0 as follows:1. The parameter γEε , which we choose so that P [

∑ni=1 zi ≤ γε] ≥ 1−ε, where zi ∼ exponential(λ).

2. The parameter TE given by

T =

(1 + 2ν

ζEν· γ

Eε√n

)n, with ζE =

1

λ√n·Ψ−1 (1− ε) ,

where Ψ (·) is the cdf of the exponential distribution;3. The parameterME

0 = (1 + 2ν)·γEε . In addition, we generate a Voronoi tessellation{zE1 , z

E2 , . . . , z

ET

}of the simplex

Pε =

{(z1, . . . , zn)

∣∣∣∣∣n∑j=1

zj =n

λ+√n

ΓEελ

}.

Let M = {1, . . . , 2nR} and T = {1, . . . , T}. We next use the decision variables xi, i ∈ M andvit, i ∈M, t ∈ T , where(a) The variables xi represent the codewords.(b) The variables vit represent binary decision variables that are chosen in a way to constrain

the probability of error. When vit = 1, the set of decoding constraints in (42) are satisfiedfor codeword xi with noise vector zEt . We construct the following mixed binary linearoptimization problem:

max∑i,k,t

vikt (42)

s.t.n∑j=1

xij ≤ nP, ∀i = 1, . . . , 2nR,

n∑j=1

xij + (2− vit − vikt)M0 ≥n∑j=1

xkj, ∀t,∀i, k 6= i,

xij + zEtj ≥ xkj −M0 (1− vikt) , ∀i, k, j, t,T∑t=1

vit ≥ (1− ε)T, ∀i,

vit, vikt ∈ {0, 1}, ∀i, k, t,

19

dbertsim

Sticky Note

overflow. Please correct

dbertsim

Sticky Note

overflow. Please correct

We next present Algorithm 2, that computes the capacity region REn [P, λ, ε] of this channel.

Algorithm 2. Capacity Computation and Optimal Coding for the Single User additiveexponential noise channel.Input: R, P, λ, n, ν, ε.Output: Codewords{xi}i∈M , and auxiliary binary variables {vit, vikt}.Algorithm:1. Solve the mixed binary linear optimization problem (42) to compute the codewords{xi}i∈M,

and auxiliary binary variables {vit, vikt}.2. If the problem is feasible, then R ∈ RE

n [P, λ, 2ε] , that is, R can be achieved on an additiveexponential noise channel using the codewords {xi}i∈M and the decoding function gE0 (·), witha decoding error probability of at most ε.

3. If the problem is infeasible, then R /∈ REn

[P,

λ

1 + 2ν, 3ε

].

Theorem 2. (Capacity Region in a Single-User additive exponential noise channel)(a) If problem (42) is feasible, then R ∈ RE

n [P, λ, 2ε] , that is, R is achievable using the codebookB = {xi}2

nR

i=1 , and the maximum likelihood decoder (Proposition 9), achieving an averagedecoding error probability of 2ε.

(b) If problem (42) is infeasible, then R /∈ REn

[P, (1 + 2ν)−1 λ, 3ε

].

The proof of Theorem 2 is similar to the case of Gaussian noise (Theorem 1) and is omitted.

5 Computational ResultsIn this section, we discuss the computational complexity of our approach and present computationalresults to illustrate the effectiveness of the RO approach for the single-user Gaussian and additiveexponential noise channels.

5.1 Computational Complexity

We begin by discussing the computational complexity of the algorithms we presented. In Sections3, we have shown that the optimization problem for Gaussian channels is a rank minimizationproblem with semidefinite constraints. On the other hand, in Section 4.1, we have shown that foradditive exponential noise channels, the optimization problem is a mixed binary linear optimizationproblem. We next present the size of these optimization problems as a function of the key problemparameters:

1. Additive Gaussian noise channel : Recall that the key parameters that characterize the channelcoding problem (22–26) are the code-length n, noise standard deviation σ, signal power P ,rate R, error probability ε and approximation parameter ν. Given these parameters, thenumber of variables is given by O

(2nR +

((1+ν)σν· Φ−1(1− ε)

)n)which is exponential in n,

and depends on the accuracy ν required, error probability ε and the rate R.

2. Additive Exponential noise channel : The key parameters that characterize the channel codingproblem (42) are the code-length n, noise rate λ, signal power P , rate R, error probabilityε and approximation parameter ν. Given these parameters, the number of variables is given

20

dbertsim

Sticky Note

Section

by O(

2nR +(

(1+2ν)λν·Ψ−1(1− ε)

)n), where Ψ(·) is the exponential cdf. Note that this is

exponential in n, and depends on the accuracy ν required, error probability ε and the rateR.

Nature of the Optimization Problem and Structure of the Typical Set

Furthermore, the nature of the optimization problem depends on the type of the typical set. Thekey observation is that when the typical set is a polyhedron, the underlying optimization problemis a binary linear optimization problem (either mixed or not).

In Corollary 2, we characterized the typical sets for various distributions. In particular, weshowed that the typical sets for the uniform and binary distributions are polyhedra. Moreover, ifwe do not know the specific distribution of noise in a channel, we can potentially use limit lawsthat random sequences satisfy, to model the primitives of the noise. Specifically, we may assumethe following uncertainty sets for noise sequences X = {X1, . . . , Xn}:1. The Central Limit Theorem (CLT) : The Central Limit Theorem states that the normalized

sum of random variables (X1 + . . . + Xn − nµ)/(σ ·√n) is asymptotically standard normal.

This allows us to construct an uncertainty set that makes use of the properties of normalrandom variables. In particular, we can construct an uncertainty set UCLT as follows

UCLT =

{(X1, X2, . . . , Xn)

∣∣∣∣∣−Γ ≤

(n∑i=1

Xi − nµ

)/σ ·√n ≤ Γ

}, (43)

where Γ is chosen using the properties of the normal distribution, as we discussed earlier inthis section.

2. Stable Laws : These limit laws express the fact that a sum of many independent randomvariables will tend to be distributed according to one of a small set of "attractor" (i.e. stable)distributions. When the variance of the variables is finite, the "attractor" distribution is thenormal distribution. In particular, these stable laws allow us to construct uncertainty sets forheavy-tailed distributions. This allows us to consider the uncertainty set

UHT =

{(X1, X2, . . . , Xn)

∣∣∣∣∣−Γ ≤

(n∑i=1

Xi − nµ

)/n1/α ≤ Γ

}, (44)

where Γ can be chosen based on the distributional properties of the heavy tailed random variablesXi. Using the RO approach used in Sections 3, and 4.1, we present a summary of the nature ofthe optimization problems in Table 1. From Table 1, we see that for many cases the underlyingoptimization problem is a mixed binary linear optimization problem. Due to advances in opti-mization theory and processing speeds in the last three decades, large scale mixed binary linearoptimization problems are solved routinely by commercial solvers, where as large scale semidefiniteoptimization problems present computational challenges. Given that, in reality, we have a choicein modeling noise, it is reasonable in our opinion, to model noise so that the underlying typical setare polyhedra. In particular, when we model noise using the limit laws outlined in Eqs. (43) and(44), the resulting optimization problems become mixed binary linear optimization problems.

21

Noise Typical Set Optimization Problem

Gaussian (independent) Ball in (8) Rank minimization withsemidefinite constraints

Gaussian (correlated) Ellipsoid in (9) Rank minimization withsemidefinite constraints

exponential Polyhedron in (10) Mixed binary linearoptimization problem

uniform Polyhedron in (11) Mixed binary linearoptimization problem

Binary symmetric noise Polyhedron in (12) Binary optimizationproblem

CLT Uncertainty Set Polyhedron in (43) Mixed binary linearoptimization problem

Stable Law Uncertainty Set Polyhderon in (44) Mixed binary linearoptimization problem

Table 1: Dependence of the nature of the optimization problem with noise models.

5.2 Implementation

For single-user Gaussian channels we presented Algorithm 1 for encoding that involve the solutionof a rank minimization problem subject to semidefinite constraints:

min rank (X)

s.t. A •X ≤ 0,X � 0,

(45)

We use the following iterative algorithm developed by Fazell et al. [2003], to solve Problem (45).

Algorithm 3. Solving Rank Minimization with Semidefinite ConstraintsInput: A, δ0, K.Output: A matrix XK, solution to Problem (45).Algorithm:1. Solve the convex optimization problem

min Tr (X)

s.t. A •X ≤ 0,

X � 0,

and let X0 denote the optimal solution.2. For each iteration k = 1, . . . , K, set δ = δ0/k and solve the optimization problem

min Tr((

Xk−1 + δI)−1

X)

s.t. A •X ≤ 0,

X � 0.

22

The key connection of Problem (45) and Algorithm 3 is the formula:

rank (X) = n− limδ→0

log det(X + δI)

log δ,

and thus in order to solve Problem (45) we aim to minimize log det(X + δI) with successively de-creasing values of δ. Algorithm 3 can be interpreted as a steepest descent algorithm on log det(X+δI) with successively decreasing values of δ. Fazell et al. [2003] showed that Algorithm 3 is guar-anteed to converge to a local minimum of log det(X + δI). The empirical behavior of Algorithm3 depends on the choices of the parameters δ0 and K. Yu and Lau [2011], Wang and Sha [2011]report that Algorithm 3 finds the minimum rank successfully in signal processing applications.RemarkNote that when we find r∗ = 1, we get a code and therefore, we establish a valid lower bound. Onthe other hand, a proof of infeasibility of r∗ = 1 is difficult. But recall that, since we formulate theencoding problem (29) as a binary semidefinite problem, we can in principle, produce a certificate ofinfeasibility by enumerating all possible binaries and then optimize over the remaining semidefiniteconstraints. Clearly this is not a practical method but it does illustrate that, in principle, ourapproach provides valid certificates but the process can take potentially exponential time. In ourimplementation, to check r∗ ≥ 2, we try 50,000 restarts and declare the unachievability if we failin each of these restarts. In this sense, the upper bounds we report are approximate, while thelower bounds are exact.

5.3 The Single-User Gaussian Channel

For a single-user Gaussian channel and use Algorithm 1 to compute the capacity region. In orderto construct the capacity region, we choose values of n ≤ 140, ε = 0.001, ν = 0.05 and for differentvalues of R, we check whether the code construction problem is feasible (r∗ = 1 or r∗ ≥ 2). Asdiscussed before, the size of the problem depends on the value of the rate R. For example, forR = 0.2 the resulting semidefinite optimization problems for n = 140 involves around 3 billionvariables. In order to solve this problem, we divide this problem by dividing the feasible regioninto 5000 equal regions. In each region, we seek to find 2nR/5000 codewords. Let Ak be the set ofcodewords in the kth region, where each of the codeword in this region satisfies the linear constraint

cos ((k − 1)π/5000) ≤ xi1 ≤ cos (kπ/5000) , ∀xi ∈ Ak, (46)

where cos(·) is the cosine function. The main problem now reduces to 5000 subproblems each ofwhich has 600 thousand variables. Each of the subproblem has the extra constraint (46). We usethe open source implementation of the SDPARA algorithm (Yamashita et al. [2003]) which allowsparallelization. This algorithm took 18 hours on a multi-core linux machine with 48GB RAMand 8 processors. For the maximum value of R = 0.35 that we computed, the total number ofvariables exceeds 50 trillion variables. In order to solve this problem, we divided the problem into5 million equal regions as in Eq. (46). Each subproblem involved 2 million variables. This entireexercise tool more than 700 hrs on a 32-core linux machine with 168 GB RAM using the SDPARAalgorithm.

In Figure 1, and for specific values of n we provide the lower and upper bounds from the ROapproach, that is an interval [R,R] such that R is achievable and we construct the correspondingcode, while R is not achievable by any code. For comparison, we present comparable lower andupper bounds using the methodology in Polyanskiy et al. [2010]. We also record the asymptotic

23

dbertsim

Sticky Note

I do not understand how the subproblems do not interact. We need to comment on this. Is htis an approxiamtion only?

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140

Rat

e R

in b

its p

er c

hann

el u

se

Codelength n

Approximation BandRO inner boundRO outer bound

Shannon CapacityPolyanskiy et al. [2009] inner boundPolyanskiy et al. [2009] outer bound

Figure 1: Comparison of the lower and upper bounds provided by the RO approach and byPolyanskiy et al. [2010] as a function of the code length n in a single-user Gaussian channel.

Shannon capacity. We observe that the upper bound of the RO approach is sharper than theupper bound in Polyanskiy et al. [2010], while the lower bounds are comparable. Furthermore, forn = 140, the asymptotic Shannon capacity is still quite far from the lower and upper bounds weachieve.

5.4 The Single-User additive exponential noise channel

In this section, we aim to examine whether we are able to solve problems with larger code lengthn if the typical sets are polyhedra as opposed to ellipsoids. We selected a single-user additiveexponential noise channel and applied Algorithm 2. We were able to find lower and upper boundson channel capacity for values of n = 300, ε = 0.001, ν = 0.05, compared to n = 140 for theGaussian case. The mixed binary linear optimization problems we solved involved 1, 430, 000variables. We used CPLEX 11.1 on a computer with 150G RAM, 8 processors running on LINUX.To check whether a given R is achievable took 18 hours for n = 300. The ability to solve largerproblems in this case is due to the fact that the state of the art in computational linear integeroptimization is more advanced than rank minimization with semidefinite constraints, which is thekey computational problem for Gaussian channels.

In Figure 2, we report the lower and upper bounds from the RO approach as a function of n.For comparison, we also record the asymptotic Shannon capacity in this case.

24

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200 250 300

Rat

e R

in b

its p

er c

hann

el u

se

Codelength n

Approximation BandRO inner boundRO outer bound

Shannon capacity

Figure 2: Lower and upper bounds provided by the RO approach as a function of the code lengthn in a single-user exponential channel.

6 ConclusionsIn this paper, we proposed a RO approach to formulate and algorithmically solve problems innetwork information theory. As summarized in Table 1, the nature of the optimization problemranges from binary/mixed binary linear optimization problems to rank minimization problemswith semidefinite constraints. We have been able to solve problems with n = 140 for single-userGaussian channels, and n = 300 for single-user exponential channels. The sizes of the problemswe can solve for non-Gaussian channels are in fact larger as the state of the art in computationallinear integer optimization is more advanced than rank minimization with semidefinite constraints,which is the key computational problem for Gaussian channels. Given that we have a choice inmodeling noise, it is reasonable in our opinion, to model noise so that the underlying typical setare polyhedra. Furthermore, as optimization algorithms and computing infrastructure improve, weanticipate we will be able to increase the size of the problems we are to able to tackle potentiallysignificantly.

Acknowledgements

We thank the Associate editor Professor Guo and the referees for insightful comments that im-proved the paper significantly.

25

ReferencesC. Bandi and D. Bertsimas. Tractable stochastic analysis in high dimensions via robust optimiza-tion. Mathematical Programming, 134(1):23–70, 2012.

C. Bandi and D. Bertsimas. Optimal design for multi-item auctions: A robust optimizationapproach. Mathematics of Operations Research, 39(4):1012–1038, 2014a.

C. Bandi and D. Bertsimas. Robust option pricing. European Journal of Operations Research,239(3):842–853, 2014b.

C. Bandi and D. Bertsimas. Channel coding via robust optimization, part 2: The multi-channelcase. Submitted for publication, 2015.

C. Bandi, D. Bertsimas, and N. Youssef. Robust queueing theory. Operations Research, to appear,2015.

A. Ben-Tal and A. Nemirovski. Robust convex optimization. Mathematics of Operations Research,23(4):769–805, 1998.

A. Ben-Tal and A. Nemirovski. Robust solutions to uncertain programs. Operations ResearchLetters, 25:1–13, 1999.

A. Ben-Tal and A. Nemirovski. Robust solutions of linear programming problems contaminatedwith uncertain data. Mathematical Programming, 88:411–424, 2000.

A. Ben-Tal, L. El-Ghaoui, and A. Nemirovski. Robust Optimization. Princeton University Press,2009.

D. Bertsimas and M. Sim. Robust discrete optimization and network flows. Mathematical Pro-gramming, 98:49–71, 2003.

D. Bertsimas and M. Sim. The price of robustness. Operations Research, 52(1):35–53, 2004.

D. Bertsimas and J. Tsitsiklis. Introduction to Linear Optimization . Athena Scientific and DynamicIdeas, Belmont, Feb. 1997. ISBN 1886529191.

D. Bertsimas and R. Weismantel. Optimization Over Integers. Dynamic Ideas, Belmont, 2005.ISBN 0975914626. URL http://www.worldcat.org/isbn/0975914626.

D. Bertsimas, D. Brown, and C. Caramanis. Theory and applications of robust optimization.SIAM Review, 53:464–501, 2011.

T. Cover and J. Thomas. Elements of Information Theory. Wiley, New York, NY, USA, 2006.

I. Cplex. 11.0, 2014.

G. B. Dantzig. Maximization of a linear function of variables subject to linear inequalities. ActivityAnalysis of Production and Allocation, pages 339–347, 1947.

Q. Du, V. Faber, and M. Gunzburger. Centroidal voronoi tessellations: Applications and algo-rithms. SIAM Review, 41(4):636–676, 1999.

26

Q. Du, M. Emelianenko, and L. Ju. Convergence of the Lloyd algorithm for computing centroidalvoronoi tessellations. SIAM Journal on Numerical Analysis, 44(1):102–119, 2006.

L. El-Ghaoui and H. Lebret. Robust solutions to least-square problems to uncertain data matrices.SIAM Journal on Matrix Analysis and Applications, 18:1035–1064, 1997.

L. El-Ghaoui, F. Oustry, and H. Lebret. Robust solutions to uncertain semidefinite programs.SIAM Journal on Optimization, 9:33–52, 1998.

M. Fazell, H. Hindi, and S. P. Boyd. Log-det heuristic for matrix rank minimization with applica-tions to hankel and euclidean distance matrices. Proceedings American Control Conference, 3:2156–2162, 2003.

R. M. Gray and D. L. Neuhoff. Quantization. IEEE Transactions on Information Theory, 44(6):2325–29, 1998.

Gurobi. Gurobi 4.0.2. software, Dec. 2010.

Y. Liu, W. Wang, B. Lévy, F. Sun, D.-M. Yan, L. Lu, and C. Yang. On centroidal voronoitessellation-energy smoothness and fast computation. ACM Transactions on Graphics (ToG),28(4):101, 2009.

S. P. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, 1982.

Y. Polyanskiy, H. V. Poor, and S. Verdú. Channel coding rate in the finite blocklength regime.Information Theory, IEEE Transactions on, 56(5):2307–2359, 2010.

M. Sabin and R. M. Gray. Global convergence and empirical consistency of the generalized Lloydalgorithm. IEEE Transactions on Information Theory, 32(2):148–155, 1986.

C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 1948.

S. Verdú. Fifty years of shannon theory. IEEE Transactions on information theory, 44(6):2057–2078, 1998.

S. Verdú and S. W. McLaughlin, editors. Information Theory: 50 Years of Discovery. IEEE Press,Piscataway, NJ, USA, 2000. ISBN 0-7803-5363-3.

M. Wang and F. Sha. Information theoretical clustering via semidefinite programming. AISTATS,pages 761–769, 2011.

A. Wyner. Random packing and coverings of the unit n-sphere. Bell System Technical Journal,46(9):2111–2118, 1967.

M. Yamashita, K. Fujisawa, and M. Kojima. Sdpara: Semidefinite programming algorithm parallelversion. Parallel Computing, 29(8):1053–1067, 2003.

H. Yu and V. K. Lau. Rank-constrained schur-convex optimization with multiple trace/log-detconstraints. IEEE Transactions on Signal Processing, 59(1):304–314, 2011.

27

Appendix A. Proofs of Auxilliary ResultsProof of Proposition 6.(a) We first show that

if ‖xi − xk + z‖ ≤ ‖z‖ , then ‖xi − xk + αz‖ ≤ α ‖z‖ , ∀α ≥ 1. (47)

If ‖xi − xk + z‖ ≤ ‖z‖, then we have

‖xi − xk + αz‖ = ‖xi − xk + z + (α− 1) z‖≤ ‖xi − xk + z‖+ ‖(α− 1) z‖ (triangle inequality)≤ ‖z‖+ (α− 1) ‖z‖ = α ‖z‖ .

We next consider a sample path ωB such that ∃k 6= i : ‖xi − xk + uB (ωB)‖ ≤ rB. Then, considera sample path ωA given by

uA (ωA) =rArB· uB (ωB) ·

Applying (47) with α = rA/rB ≥ 1, we have ‖xi − xk + αuB (ωB)‖ ≤ αrB, leading to

‖xi − xk + uA (ωA)‖ ≤ rA.

In other words, we have1 {Ei [uA (ωA)]} ≥ 1 {Ei [uB (ωB)]} . (48)

Noting that, if uB(ωB) is uniformly distributed in Sn(rB), then uA(ωA) is also uniformly distributedin Sn(rA), by taking expectations in (48), we have

P[∃k 6= i : ‖xi − xk + uA‖ ≤ rA

]≥ P

[∃k 6= i : ‖xi − xk + uB‖ ≤ rB

].

(b) Let s ∈ Sn(1), and let fi, gi : Sn (1)→ {0, 1} be defined as

fi (s) =

{1, if ∃d ≤ rA : ‖xi − xk + ds‖ ≤ d, for some k 6= i,

0, otherwise.

gi (s) =

{1, if ‖xi − xk + rAs‖ ≤ rA, for some k 6= i,

0, otherwise.

Applying (47) with z = d · s and α = rA/d ≥ 1, we have that if fi (s) = 1, then gi (s) = 1 leadingto

Ps∼s(1) [gi (s) = 1] ≥ Ps∼s(1) [fi (s) = 1] . (49)

We next consider the event{

Ei [ds]∣∣∣d = rA

}. We have{

Ei [ds]∣∣∣d = rA

}⇐⇒ gi (s) = 1. (50)

Furthermore, from Proposition 5(c), we know that zC ∼ d · s (1) with d ∼ ‖zC‖. Thus,

P[Ei [zC ]

∣∣∣‖zC‖ ≤ rA

]= P

[Ei[d · s (1)

] ∣∣∣d ≤ rA

]. (51)

28

Next observe that, conditioned on d ≤ rA, the event Ei[d · s (1)

]implies that fi (s (1)) = 1, leading

toP[Ei[ds (1)

] ∣∣∣d ≤ rA

]≤ P [fi (s (1)) = 1] . (52)

Finally, noting that conditioned on d = rA, uA ∼ s(1). Thus,

P [Ei [uA]] = Ps∼s(1)

[Ei [ds]

∣∣∣d = rA

]= Ps∼s(1) [gi (s) = 1] (from (50))

≥ Ps∼s(1) [fi (s) = 1] (from (49))

≥ P[Ei [zC ]

∣∣∣‖zC‖ ≤ rA

]. (from (51) and (52))

(c) The proof is very similar to that of part (b) and is omitted.

Proof of Proposition 7.Applying Proposition 6(c) with uA = zU , rA = σ

√n, and zG = zC , we obtain :

P [Ei [zU ]] ≤ P[Ei [zG]

∣∣∣‖zG‖ > σ√n]. (53)

Furthermore,

P[‖zG‖ > σ

√n]

= P[‖zG‖2 > nσ2

]= P

[‖zG‖2 > nσ2 −

(nσ2 − nσ2

)]= P

[1

n‖zG‖2 > σ2 − r

],

where r = σ2 − σ2. Applying Proposition 5(a), we have

P[‖zG‖ > σ

√n]≥ 1− exp (−nβ) , with β =

r − log (1 + r)

2σ2· (54)

Therefore, we have

P [Ei [zG]] = P[Ei [zG]

∣∣∣‖zG‖ > σ√n]· P[‖zG‖ > σ

√n]

+ P[Ei [zG]

∣∣∣‖zG‖ ≤ σ√n]· P[‖zG‖ ≤ σ

√n]

≥ P[Ei [zG]

∣∣∣‖zG‖ > σ√n]· P[‖zG‖ > σ

√n]

≥ P [Ei [zU ]] · P[‖zG‖ > σ

√n], (from (53))

which from (54) implies that P [Ei [zU ]] ≤ P [Ei [zG]]

P [‖zG‖ > σ√n]≤ ε

1− exp (−nβ)·

Proof of Proposition 8.(a) Constraint (23) implies that {xi}i∈M satisfy that, for the noise vector zt’s in Z with vit = 1,

‖xi − xk + zt‖ ≥ ‖zt‖ , ∀k 6= i, ⇐⇒ ‖xi − xk‖2 ≥ 2 〈xk − xi , zt〉 , ∀k 6= i,

⇐⇒ ‖xi − xk‖2 ≥ 2 (1 + ν) 〈xk − xi , wt〉 , ∀k 6= i. (55)

29

Since {z1, z2, . . . , zT} forms a Voronoi tessellation of Sn((1 + ν)γε), the setW also forms a Voronoitessellation of Sn(γε). Therefore, letting s be any vector belonging to Sn(γε), from Proposition 4applied to A =W , N = T, Λ = γε/

√n, we obtain∥∥s−wτ(s)

∥∥ ≤ θ√n, (56)

where τ (s) = arg mint∈T ‖s−wt‖ , with

θ =γε

T 1/n ·√n

=ζν

1 + ν· (57)

From (25) and (57), we have

‖xi − xk‖ ≥ 2√nζ = 2θ

1 + ν

ν

√n. (58)

We now proceed to show that if viτ(s) = 1, then ‖xi − xk + s‖ ≥ ‖s‖. We have

2 〈xk − xi , s〉 = 2⟨xk − xi,

(s−wτ(s)

)+ wτ(s)

⟩≤ 2

⟨xk − xi , wτ(s)

⟩+ 2 · ‖xi − xk‖ ·

∥∥s−wτ(s)

∥∥ (Cauchy-Schwartz)

≤ (1 + ν)−1 ‖xi − xk‖2 + 2 · ‖xi − xk‖ · θ√n (from (55) and (56))

= ‖xi − xk‖ ·{

(1 + ν)−1 ‖xi − xk‖+ 2θ√n}

≤ ‖xi − xk‖ ·{

(1 + ν)−1 ‖xi − xk‖+ν

1 + ν‖xi − xk‖

}(from (58))

= ‖xi − xk‖2 .

Therefore, if viτ(s) = 1, then

2 〈xk − xi , s〉 ≤ ‖xi − xk‖2 ⇐⇒ ‖xi − xk + s‖ ≥ ‖s‖ .

This implies that P[∀k 6= i : ‖xi − xk + s‖ ≥ γε

]≥ P

[viτ(s) = 1

].

(b) For each i ∈ B, we have

P[viτ(s) = 1

]=

T∑t=1

P[viτ(s) = 1

∣∣∣τ (s) = t]· P [τ (s) = t] . (59)

We have

P[viτ(s) = 1

∣∣∣τ (s) = t]

=

{1, if vit = 1,

0, otherwise,

= vit. (60)

Moreover, since the set of vectors W forms a Voronoi tessellation of Sn(γε), the Voronoi regions ofthe points wt ∈ W are identical with the same area. Consequently, choosing s uniformly inducesa uniform distribution for τ (s) on the elements of the set T , that is,

P [τ (s) = t] =1

T, ∀t ∈ T . (61)

30

Substituting (60) and (61) in (59), we have

P[viτ(s) = 1

]=

T∑t=1

P[viτ(s) = 1

∣∣∣τ (s) = t]· P [τ (s) = t]

=1

T

T∑t=1

vit ≥1

T· (1− ε)T (from (24))

= 1− ε.

Appendix B - Computing vectors Z = {z1, z2, . . . , zT} usingLloyd’s AlgorithmIn this section, we present Lloyd’s algorithm (Lloyd [1982]) to compute the vectors Z = {z1, z2, . . . , zT}.

Algorithm 4. Lloyd’s Algorithm (Lloyd [1982])Input: Sphere B(r), parameters δ0, T .Output: A set of vectors Z = {z1, z2, . . . , zT} that form a Voronoi tessellation of B(r) with Tpoints.Algorithm:1. Generate a uniform distribution of vectors Z0 = {z0

1, z02, . . . , z

0T} over B(r) . Set k = 0.

2. Repeat

• Compute the Voronoi diagram of the points in the set Zk.

• Let zk+1i be the centroid of the Voronoi cell that point zki belongs to.

• If maxi=1,...,T ‖zk+1i −zki ‖ ≤ δ0, stop and output Zk+1; otherwise set k := k+1 and go to Step

2.

Sabin and Gray [1986] have shown that Algorithm 4 converges to a Voronoi tessellation of B(r)with T points for appropriately chosen values of δ0 (see also Du et al. [1999, 2006]). Liu et al.[2009] have also shown that this algorithm converges at a linear rate, and have proposed otheralgorithm that converge faster. Note also that each iteration takes O(nT ) steps. In the contextof information theory Gray and Neuhoff [1998] discuss applications of Algorithm 4 to generaterate-distortion codebooks.

31

Channel Coding via Robust Optimization Part 1: The Single ...

Documents