PDC_Chap2

8/12/2019 PDC_Chap2

1/74

Chapter 2

Receiver Design for Discrete-Time

Observations

2.1 Introduction

The focus of this ad the next chapter is the receiver design. The task of the receiver canbe appreciated by considering a very noisy channel. Roughly speaking, this is a channelfor which the signal applied at the channel input has little influence on the output. TheGPS channel is a good example. Let the channel input be the electrical signal appliedto the antenna of a GPS satellite orbiting at an altitude of 20,200 km, and the channeloutput be the signal at the antenna output of a GPS receiver at sea level. The signal

of interest at the output of the receiver antenna is roughly a factor 10 weaker than theambient noise produced by various sources, including other GPS satellites, interferencefrom nearby electroniic equipment, and the thermal noise present in all conductors. Ifwe were to observe the receiver antenna output signal with a general-purpose instrument,such as an oscilloscope or a spectrum analyzer, we would not be able to distinguish thesignal from the noise. Yet, most of the time the receiver manages to reproduce the bitsequence transmitted by a satellite of interest. This is the result of a clever operation thattakes into account the source statistic, the signals structure, and the channel statistic.

It is instructive to start with the family of channel models that produce n -tuple outputs.This chapter is devoted to decisions based on the output of such channels. Althoughwe develop a general theory, the discrete-time additive white Gaussian noise (AWGN)channel will receive special attention. Understanding how to deal with such cases is ourgoal in this chapter. As a prominent special case, we will consider the discrete-timeadditive white Gaussian noise (AWGN) channel. In so doing, by the end of the chapterwe will have derived the receiver for the first layer of Figure 1.3.

Figure2.1depicts the communication system considered in this chapter. Its componentsare:

17

8/12/2019 PDC_Chap2

2/74

18 Chapter 2.

A Source: The source (not represented in the figure) produces the message to betransmitted. In a typical application, the message consists of a sequence of bitsbut this detail is not fundamental for the theory developed in this chapter. It isfundamental that the source choses one message from a set of possible messages.We are free to choose the label we assign to the various messages and our choice

is based on mathematical convenience. For now the mathematical model of a sourceis as follows. If there are m possible choices, we model the source as a randomvariable Hthat takes values in the message set H= {0, 1, . . . , (m 1)} . More oftenthan not all messages are assumed to have the same probability but for generality weallow message i to be chased with probability PH(i) . We visualize the mechanismthat produces the source output as a random experiment that takes place inside thesource and yields the outcome H=i H with probability PH(i). The message setH and the probability distribution PHare assumed to be know by the communicationsystem designer.

A Channel: The channel is specified by the input alphabet X , the output alphabetY, and by the output distribution conditioned on the input. In other words, for eachpossible channel input x X, we assume that we know the distribution of the channeloutput. If the channel output alphabet Y is discrete, then the output distributionconditioned on the input is the probability distribution pY|X(|x) , x X; if Y iscontinuous, then it is the probability density function fY|X(|x) , x X. In mostexamples, X is either the binary alphabet {0, 1} or the set R of real numbers. Itcan also be the set C of complex numbers, but we have to wait until Chapter 7tounderstand why.

A Transmitter: The transmitter is a mapping from the message set H= {0, 1, . . . , m

1} to the signal set C={c0, c1, . . . , cm1} where ci Xn for some n . We need thetransmitter to connect two alphabets that typically are incompatible, namelyH andXn . From this point of view, the transmitter is simply a sort of connector. There isanother more subtle task accomplished by the transmitter. A well-designed transmit-ter makes it possible for the receiver to meet the desired error probability. Towardsthis goal, the elements of the signal set Care such that a well-designed receiver ob-serving the channel outputs reaction can tell (with high probability) which signalfrom Chas excited the channel input.

A Receiver: The receivers task is to guess H from the channel output Y Yn .Since the transmitter map is always one-to-one, it is more realistic to picture thereceiver as guessing the channel input signal and declaring the signals index as themessage guess. We use i to represent the guess made by the receiver. Like themessage, the guess of the message is the outcome of a random experiment. Thecorresponding random variable is denoted by H H . Unless specified otherwise, thereceiver will always be designed to minimize the probability of error, denoted Pe anddefined as the probability that H differs from H. Guessing the value of a discreterandom variable H from the value of a related random variable Y is a so-calledhypothesis testing problem that comes up in various contexts. We are interested in

8/12/2019 PDC_Chap2

3/74

2.1. Introduction 19

hypothesis testing to design communications systems, but it can also be used in otherapplications, for instance to develop a fire detector.

First we give a few examples.

Example 2. A common source model consist ofH= {0, 1} and PH(0) =PH(1) = 1/2 .This models individual bits of, say, a file. Alternatively, one could model an entire file of,say, 1 Mbit by saying that H= {0, 1, . . . , (210

6 1)} and PH(i) = 12106 , i H .Example 3. A transmitter for a binary source could be a map from H = {0, 1} toC={a, a} for some real-valued constant a . This is a valid choice if the channel inputalphabet X is R . Alternatively, a transmitter for a 4-ary source could be a map fromH= {0, 1, 2, 3} to C={a, ja,a,ja} , where j= 1 . This is a valid choice ifX isC

Example 4. The channel model that we will use frequently in this chapter is the onethat maps a signal c

Rn into Y = c+ Z, where Z is a Gaussian random vector

of independent and uniformly distributed components. As we will see later, this is thediscrete-time equivalent of the baseband continuos-time channel called additive whiteGaussian noise (AWGN) channel. For this reason, following common practice, we willrefer to both as additive white Gaussian noise channels (AWGNs).

The chapter is organized as follows. We first learn the basic ideas behind hypothesistesting, the field that deals with the problem of guessing the outcome of a random variablebased on the observation of another random variable. Then we study the Q function asit is a very valuable tool in dealing with communication problems that involve Gaussiannoise. At that point, we will be ready to consider the problem of communicating across

the additive white Gaussian noise channel. We will first consider the case that involvestwo messages and scalar signals, then the case of two messages and n -tuple signals, andfinally the case of an arbitrary number m of messages and n -tuple signals. The lastpart of the chapter deals with techniques to bound the error probability when an exactexpression is unknown or too complex.

A point about terminology needs to be clarified. It might seem awkward that we usethe notation ci Cto denote the transmitter output signal for message i . We do sobecause the transmitter of this chapter will become the encoder in the next and subsequentchapters. When the input is i , the output of the encoder is the codeword ci and the setof codewords is the codebook C. This is the reasoning behind our notation.

Transmitter

i H

ci C Xn Y Yn

Channel Receiver

H H

Figure 2.1: General setup for Chapter 2.

8/12/2019 PDC_Chap2

4/74

20 Chapter 2.

2.2 Hypothesis Testing

Hypothesis testingrefers to the problem of deciding which hypotheses (read event) hasoccurred based on an observable (read side information). Expressed in mathematicalterms, the problem to decide the outcome of a random variable Hthat takes values in afinite alphabet H ={0, 1, . . . , m 1} , based on the outcome of the random variable Ycalled observable.

This problem comes up in various applications under different names. Hypothesis testingis the terminology used in statistics where the problem is studied from a fundamental pointof view. A receiver does hypothesis testing, but communication people call it decoding. Analarm system such as a fire detector also does hypothesis testing, but people would call itdetection. A more appealing name for hypothesis testing is decision making. Hypothesistesting, decoding, detection, and decision making are all synonyms.

In communications, the hypothesis His the message to be transmitted and the observableY is the channel output. The receiver guesses H based on Y , assuming that boththe distribution of Hand the conditional distribution of Y given H are known. Theyrepresent the information we have about the source and the statistical dependence betweenthe source and the observable, respectively.

The receivers decision will be denoted by H. If we could, we would ensure that H=H,but this is generally not possible. The goal is to devise a decision strategy that maximizesthe probability Pc=P r{H=H} that the decision is correct.1

We will always assume that we know the a prioriprobability PHand that for each i

H

we know the conditional probability density function2 (pdf) of Y given H=i , denotedby fY|H(|i) .

Hypothesis testing is at the heart of the communication problem. As described by ClaudeShannon in the introduction to what is arguably the most influential paper ever writtenon the subject [4], The fundamental problem of communication is that of reproducingat one point either exactly or approximately a message selected at another point.

Example 5. As a typical example of a hypothesis testing problem, consider the problemof communicating one bit of information across an optical fiber. The bit being transmittedis modeled by the random variable H

{0, 1} , PH(0) = 1/2 . If H = 1 , we switch on

an LED and its light is carried across the optical fiber to a photodetector at the n -tuple former. The photodetector outputs the number of photons Y N it detects. Theproblem is to decide whether H = 0 (the LED is off) or H = 1 (the LED is on). Ourdecision can only be based on whatever prior information we have about the model and

1 P r{} is a short-hand for probability of the enclosed event.2In most cases of interest in communication, the random variable Y is a continuous one. That is why

in the above discussion we have implicitly assumed that, given H = i , Y has a pdf fY|H(|i) . I f Yis a discrete random variable, then we assume that we know the conditional probability mass function

pY|H(|i) .

8/12/2019 PDC_Chap2

5/74

2.2. Hypothesis Testing 21

on the actual observation y . What makes the problem interesting is that it is impossibleto determine H from Y with certainty. Even if the LED is off, the detector is likelyto detect some photons (e.g. due to ambient light). A good assumption is that Y isPoisson distributed with intensity , which depends on whether the LED is on or off.Mathematically, the situation is as follows:

H= 0, Y PY|H(y|0) = y0

y!e0 .

H= 1, Y PY|H(y|1) = y1

y!e1 ,

where 0 0 < 1 . We read the above as follows: When H = 0 , the observable Yis Poisson distributed with intensity 0 . When H = 1 , Y is Poisson distributed withintensity 1 .

Once again, the problem of deciding the value of H from the observable Y is a standard

hypothesis testing problem. It will always be assumed that the distribution of H andthat of Y for each value of Hare known to the decision maker.

From PH and fY|H, via Bayes rule, we obtain

PH|Y(i|y) =PH(i)fY|H(y|i)

fY(y)

where fY(y) =

i PH(i)fY|H(y|i) . In the above expression PH|Y(i|y) is the posterior(also calleda posteriori probabilityof H given Y). By observing Y =y , the probabilitythat H=i goes from pH(i) to PH|Y(i|y) .

If we choose H=i , then the probability that we made the correct decision is the prob-ability that H= i , i.e., PH|Y(i|y) . As our goal is to maximize the probability of beingcorrect, the optimum decision rule is

H(y) = arg maxi

PH|Y(i|y) (MAP decision rule), (2.1)

where arg maxi g(i) stands for one of the arguments i for which the function g(i)achieves its maximum. The above is called maximum a posteriori (MAP) decision rule.In case of ties, i.e. if PH|Y(j|y) equals PH|Y(k|y) equals maxi PH|Y(i|y), then it does not

matter if we decide for H=k or for H=j . In either case, the probability that we have

decided correctly is the same.Because the MAP rule maximizes the probability of being correct for each observationy , it also maximizes the unconditional probability Pc of being correct. The former isPH|Y(H(y)|y) . If we plug in the random variable Y instead of y , then we obtain arandom variable. (A real-valued function of a random variable is a random variable.)The expected valued of this random variable is the (unconditional) probability of beingcorrect, i.e.,

Pc= E[PH|Y(H(Y)|Y)] =

y

PH|Y(H(y)|y)fY(y)dy. (2.2)

8/12/2019 PDC_Chap2

6/74

22 Chapter 2.

There is an important special case, namely when His uniformly distributed. In this casePH|Y(i|y), as a function of i , is proportional to fY|H(y|i)/m . Therefore, the argumentthat maximizes PH|Y(i|y) also maximizes fY|H(y|i) . Then the MAP decision rule isequivalent to the maximum likelihood (ML) decision rule:

H(y) = arg maxi fY|H(y|i) (ML decision rule). (2.3)

Notice that the ML decision rule does not require the prior PH. For this reason it is thesolution of choice when the prior is not known.

2.2.1 Binary Hypothesis Testing

The special case in which we have to make a binary decision, i.e., H = {0, 1} , is bothinstructive and of practical relevance. We begin with it and generalize in the next section.

As there are only two alternatives to be tested, the MAP test may now be written as

fY|H(y|1)PH(1)

fY(y)

H= 1 0. Hence, for the derivation that follows, we assume that it is the case.

The definition ofBi,j may be rewritten in either of the following two forms

y:

PH(j)fY|H(y|j)

PH(i)fY|H(y|i) 1

=

y :PH(j)fY|H(y|j)

PH(i)fY|H(y|i) 1

except that the above fraction is not defined when fY|H(y|i) vanishes. This exceptionapart, we see that

Bi,j(y)

PH(j)fY|H(y|j)

PH(i)fY|H(y|i),

4There are two versions of the Bhattacharyya bound. Here we derive the one that has the simplerderivation. The other version, which is tighter by a factor 2 , is derived in Problems32 and33.

8/12/2019 PDC_Chap2

26/74

42 Chapter 2.

is true when y is inside Bi,j; it is also true when outside because the left side vanishesand the right is never negative. We do not have to worry about the exception because wewill use

fY|H(y|i) Bi,j(y) fY|H(y|i)PH(j)fY|H(y|j)

PH(i)fY|H(y|i) =PH(j)

PH(i)

fY|H(y|i)fY|H(y|j),

which is obviously true when fY|H(y|i) vanishes.

We are now ready to derive the Bhattacharyya bound:

P r{Y Bi,j|H=i}=

yBi,j

fY|H(y|i)dy

=

yRn

fY|H(y|i) Bi,j(y)dy

PH(j)PH(i)

yRn

fY|H(y|i)fY|H(y|j)dy. (2.15)

What makes the last integral appealing is that we integrate over the entire Rn . As shownin Problem35, for discrete memoryless channelsthe bound further simplifies.

As the name indicates, the Union Bhattacharyya bound combines (2.14) and (2.15),namely

Pe(i) j:j=i

P r{Y Bi,j|H=i} j:j=i

PH(j)

PH(i) y

Rn fY|H(y|i)fY|H(y|j)dy.We can now remove the conditioning on H=i and obtain

Pe

i

j:j=i

PH(i)PH(j)

yRn

fY|H(y|i)fY|H(y|j)dy.

Example 20. (Tightness of the Bhattacharyya Bound) Let the message H {0, 1} beequiprobable, let the channel be the binary erasure channel described in Figure2.11, andlet ci= (i , i , . . . , i)

T be the signal used when H=i .

11 p 1

01 p

0YX

Figure 2.11: Binary erasure channel.

8/12/2019 PDC_Chap2

27/74

2.7. Summary 43

The Bhattacharyya bound for this case yields

P r{Y B0,1|H= 0}

y{0,1,}n

PY|H(y|1)PY|H(y|0)

=

y{0,1,}n

PY|X(y|c1)PY|X(y|c0)

(a)= pn,

where in (a) we used the fact that the first factor under the square root vanishes if ycontains 0s and the second vanishes if y contains 1s. Hence the only non-vanishing termin the sum is the one for which yi = for all i . The same bound applies for H = 1 .Hence Pe 12pn + 12pn =pn .If we use the tighter version of the union Bhattacharyya bound, which as mentioned earlieris tighter by a factor of 2 , then we obtain

Pe 12pn.

For the Binary Erasure Channel and the two codewords c0 and c1 we can actually com-pute the exact probability of error:

Pe=1

2P r{Y = (,, . . . ,)T}=

1

2pn.

The Bhattacharyya bound is tight for the scenario considered in this example!

2.7 Summary

The maximum a posteriori probability (MAP) rule is a decision rule that does exactlywhat the name implies it maximizes the a posteriori probability and in so doing itmaximizes the probability that the decision is correct. With hindsight, the key idea isquite simple and it applies even when there is no observable. Let us review it.

Assume that a coin is flipped and we have to guess the outcome. We model the coinby the random variable H

{0, 1} . All we know is PH(0) and PH(1) . Suppose that

PH(0) PH(1). Clearly we have the highest chance of being correct if we guess H= 1 .We will be correct if indeed H= 1, and this has probability PH(1). More generally, weshould choose the i that maximizes PH() and the probability of being correct is PH(i) .

It is more interesting when there is some side-information. The side information isobtained when we observe the outcome of a related random variable Y . Once we havemade the observation Y = y , our knowledge about the distribution of H gets updatedfrom the prior distribution PH() to the posterior distribution PH|Y(|y). What we havesaid in the previous paragraphs applies with the posterior instead of the prior.

8/12/2019 PDC_Chap2

28/74

44 Chapter 2.

In a typical example PH() is constant whereas for the observed y , PH|Y(|y) may bestrongly biased in favor of one hypothesis. If it is strongly biased, the observable has beenvery informative, which is what we hope of course.

Often PH|Y is not given to us, but we can find it from PH and fY|H via Bayes rule.

Althoug PH|Y is the most fundamental quantity associated to a MAP test and thereforeit would make sense to write the test in terms of PH|Y , the test is typically written interms of PH and fY|H because these are the quantities that are specified as part of themodel.

Ideally a receiver performs a MAP decision. We have emphasised the case in which allhypotheses have the same probability as this is a common assumption in digital communi-cation. Then the MAP and the ML rule are identical. We have paid particular attentionto communication across the discrete-time AWGN channel as it will play an important rolein subsequent chapters. The ML receiver for the AWGN channel is a minimum-distancedecision rule and in the simplest cases the error probability can be computed exactly by

means of the Q -function and can be upper bounded by means of the union bound andthe Q -function otherwise.

A quite general and useful technique to upper bound the probability of error is the unionBhattacharyya bound. Notice that it applies to MAP decisions associated to generalhypothesis testing problems, not only to communication problems. All we need to evaluatethe union Bhattacharyya bound are fY|H and PH.

We end this summary with an example that shows how the posterior becomes more andmore selective as the number of observations increases. The example also shows that theposterior becomes less selective if the observations are less reliable.

Example 21. Assume H {0, 1} and PH(0) = PH(1) = 1/2 . The outcome of H iscommunicated across a Binary Symmetric Channel (BSC) of crossover probability p < 1

2

via a transmitter that sends n 0s when H = 0 and n 1s when H = 1 . The BSChas input alphabet X = {0, 1} , output alphabet Y = X , and transition probabilitypY|X(y|x) =

ni=1pY|X(yi|xi) where pY|X(yi|xi) equals 1 p if yi= xi and p otherwise.

Letting k be the number of 1s in the observed channel output y we have

PY|H(y|i) =

pk(1 p)nk, H= 0pnk(1 p)k, H= 1.

Using Bayes rule,

PH|Y(i|y) =PH,Y(i, y)

PY(y) =

PH(i)PY|H(y|i)

PY(y) ,

where PY(y) =

i PY|H(y|i)PH(i) is the normalization that ensures

i PH|Y(i|y) = 1 .

8/12/2019 PDC_Chap2

29/74

2.7. Summary 45

Hence

PH|Y(0|y) =pk(1 p)nk

2PY(y) =

p

1 pk

(1 p)n2PY(y)

PH|Y(1|y) =

pnk(1

p)k

2PY(y) =1

p

pk pn

2PY(y) .

Figure2.12depicts the behavior of PH|Y(0|y) as a function of the number k of 1s in y .For the top two figures, p= 0.25 . We see that when n= 50 (right figure), the prior isvery biased in favor of one or the other hypothesis, unless the numberk of obverted 1 s isnearly n/2 = 25 . Comparing to n= 1 (left figure), we see that many observations allowthe receiver to make a more confident decision. This is true also for p = .47 (bottomrow), but we see that with the crossover probability p close to 1/2 , there is a smoothertransition between the region in favor of one hypothesis and the region in favor of theother. If we make only one observation (bottom left figure), then there is only a slight

diff

erence between the posterior for H= 0 and that for H= 1 .

k0 1

PH|Y(0|y)

0

.5

1

(a) p

=.

25 , n

= 1

k0 10 20 30 40 50

0

.5

1

PH|Y(0|y)

(b) p

=.

25 , n

= 50

k

0 1

PH|Y(0|y)

0

.5

1

(c) p= .47 , n= 1

k

0 10 20 30 40 50

PH|Y(0|y)

0

.5

1

(d) p= .47 , n= 50

Figure 2.12: Posterior as a function of the number k of 1s observed at theoutput of a BSC of crossover probability p . The channel input consists of n

0 s when H= 0 and of n 1 s when H= 1 .

8/12/2019 PDC_Chap2

30/74

46 Chapter 2.

Appendix 2.A Facts About Matrices

In this appendix we provide a summary of useful definitions and facts about matrices.An excellent text about matrices is [5]. Hereafter H is the conjugate transpose of thematrix H. It is also called the Hermitian adjointof H.

Definition 22. A matrix U Cnn is said to be unitary if UU =I. If U is unitaryand has real-valued entries, then it is orthogonal.

The following theorem lists a number of handy facts about unitary matrices. Most ofthem are straightforward. Proofs can be found in [5, page 67].

Theorem 23. If U Cnn

, the following are equivalent:(a) U is unitary;

(b) U is nonsingular and U =U1 ;

(c) U U =I;

(d) U is unitary

(e) The columns of U form an orthonormal set;

(f) The rows of U form an orthonormal set; and

(g) For all x Cn the Euclidean length of y = U x is the same as that of x ; that is,y

y= x

x .Theorem 24. (Schur) Any square matrix A can be written as A= U RU where U isunitary and R is an upper triangular matrix whose diagonal entries are the eigenvaluesof A .

Proof. Let us use induction on the size n of the matrix. The theorem is clearly true forn= 1. Let us now show that if it is true for n 1, it follows that it is true for n . GivenA of size n , let v be an eigenvector of unit norm, and the corresponding eigenvalue.Let Vbe a unitary matrix whose first column is v . Consider the matrix VAV . Thefirst column of this matrix is given by VAv= Vv = e1 , where e1 is the unit vector

along the first coordinate. Thus

VAV =

0 B

,

where B is square and of dimension n 1 . By the induction hypothesis B = W SW ,where W is unitary and S is upper triangular. Thus,

VAV =

0 W SW

=

1 00 W

0 S

1 00 W

(2.16)

8/12/2019 PDC_Chap2

31/74

2.A. Facts About Matrices 47

and putting

U=V

1 00 W

and R=

0 S

,

we see that Uis unitary, R is upper triangular and A= U RU , completing the induction

step. The eigenvalues of a matrix are the roots of the characteristic polynomial. To seethat the diagonal entries of R are indeed the eigenvalues of A it suffices to bring thecharacteristic polynomial ofA in the following form: det(IA) = det U(I R)U =det(I R) = i( rii) .Definition 25. A matrix H Cnn is said to beHermitianif H=H . It is said to beSkew-Hermitian if H =H . If H is Hermitian and has real-valued entries, then it issymmetric.

Recall that a polynomial of degree n has exactly n roots over C . Hence an nn matrixhas exactly n eigenvalues in C , specifically the n roots of the characteristic polynomial

det(I A) .Lemma 26. A Hermitian matrix H Cnn can be written as

H=UU =

i

iUiUi

whereU is unitary and =diag(1, . . . ,n) is a diagonal that consists of the eigenvaluesof H. Moreover, the eigenvalues are real and the i th column of U is an eigenvectorassociated to i .

Proof. By Theorem24(Schur) we can write H = U RU where U is unitary and R isupper triangular with the diagonal elements consisting of the eigenvalues of A . FromR = UHU we immediately see that R is Hermitian. Since it is also diagonal, thediagonal elements must be real. If ui is the i th column of U, then

Hui= UUui=Uei= Uiei= iui

showing that it is indeed an eigenvector associated to the i th eigenvalue i .

Exercise 27. Show that if H Cnn is Hermitian, then UHU is real for all UCn .

A class of Hermitian matrices with a special positivity property arises naturally in manyapplications, including communication theory. They can be thought of as a matrix equiv-alent of the notion of positive numbers.

Definition 28. An Hermitian matrix H Cnn is said to be positive definite ifUHU >0 for all non-zero U Cn.

If the above strict inequality is weakened to UHU 0 , then A is said to be positivesemidefinite.

8/12/2019 PDC_Chap2

32/74

48 Chapter 2.

Exercise29. Show that a non-singular covariance matrix is always positive definite.

Theorem 30. (SVD) Any matrix A Cmn can be written as a product

A= U DV,

whereU andVare unitary (of dimension mm andnn , respectively) andD Rmnis non-negative and diagonal. This is called thesingular value decomposition (SVD) ofA . Moreover, by letting k be the rank of A , the following statements are true:

(a) The columns of V are the eigenvectors of AA . The last n k columns span thenull space of A .

(b) The columns of Uare eigenvectors of AA . The first k columns span the range ofA .

(c) If m n then

D= diag(

1, . . . ,

n)

. . . . . . . . . . . . . . . . . . .0

,where 1 2 . . . k >k+1= . . .= n= 0 are the eigenvalues of AA Cnnwhich are non-negative because AA is Hermitian.

(d) If m n thenD= (diag(

1, . . . ,

m) : 0),

where 1 2 . . . k>k+1=. . .= m= 0 are the eigenvalues of AA .

Note 1: Recall that the non-zero eigenvalues of AB equals the non-zero eigenvalues ofBA , see e.g. [5, Theorem 1.3.29]. Hence the non-zero eigenvalues in (iii) are the same forboth cases.

Note 2: To remember that V is associated to HH(as opposed to being associated toHH ) it suffices to look at the dimensions: V Rn and HH Rnn .

Proof. It is sufficient to consider the case with m n since if m < n , we can apply theresult to A =U DV and obtain A= V DU . Hence let m n , and consider the matrixAA Cnn . This matrix is Hermitian. Hence its eigenvalues 1 2 . . .n 0are real and non-negative and we can choose the eigenvectors v1, v2, . . . , vn to form an

orthonormal basis for Cn

. Let V = (v1, . . . , vn) . Let k be the number of positiveeigenvectors and choose.

Ui= 1i

Avi, i= 1, 2, . . . , k . (2.17)

Observe that

Ui Uj = 1ij

vi AAvj =

j

ivi vj =ij, 0 i, j k.

8/12/2019 PDC_Chap2

33/74

2.A. Facts About Matrices 49

Hence {Ui : i = 1, . . . , k} form an orthonormal set in Cm . Complete this set to an or-

thonormal basis for Cm by choosing {Ui: i = k +1, . . . , m} and let U= (U1, U2, . . . , U m).Note that (2.17) implies

Uii=Avi, i= 1, 2, . . . , k , k+ 1, . . . , n ,where for i = k+ 1, . . . , n the above relationship holds since i = 0 and vi is a corre-sponding eigenvector. Using matrix notation we obtain

U

1 0

. . .

0n

. . . . . . . . . . . . . . .0

=AV, (2.18)

i.e., A= U DV

. For i= 1, 2, . . . , m ,

AAUi = UDVVDUUi

= UDDUUi=Uii,

where the last m the fact that Uui has a 1 at position i and is zero otherwise andDD = diag(1,2, . . . ,k, 0, . . . , 0). This shows that i is also an eigenvalue of AA

.We have also shown that {vi: i = k + 1, . . . , n} spans the null space ofA and from (2.18)we see that {ui:i = 1, . . . , k} spans the range of A .

The following key result is a simple application of the SVD.

Lemma 31. The linear transformation described by a matrix A Rnn maps the unitcube into a parallelepiped of volume | det A| .

Proof. We want to know the volume of the region we obtain when we apply to all thepoints of the unit cube the linear transformation described by A . From the singularvalue decomposition, we can write A = UDV , where D is diagonal and U and Vare orthogonal matrices. A transformation described by an orthogonal matrix is volumepreserving. (In fact if we apply an orthogonal matrix to an object, we obtain the sameobject described in a new coordinate system.) Hence we can focus our attention onthe effect of D . But D maps the unit vectors e1, e2, . . . , en into 1e1,2e2, . . . ,nen ,

respectively. Hence it maps the unit square into a rectangular parallelepiped of sides1,2, . . . ,n and of volume |

i| = | det D| = | det A| , where the last equality holds

because the determinant of a product (of matrices) is the product of the determinantsand the determinant of an orthogonal matrix equals 1 .

8/12/2019 PDC_Chap2

34/74

50 Chapter 2.

Appendix 2.B Densities after One-To-One Differentiable

Transformations

In this Appendix we outline how to determine the density of a random vector Y whenwe know the density of a random vector X and Y = g(X) for some differentiable andone-to-one function g .

We begin with the scalar case. Generalizing to the vector case is conceptually straight-forward. Let X Xbe a random variable of density fX and define Y = g(X) for agiven one-to-one differentiable function g: X Y. The density becomes useful when weintegrate it over some set A to obtain the probability that X A . (A probability densityfunction relates to probability like pressure relates to force). In Figure2.13the shadedarea under fX equals P r{X A} . Now assume that g maps the interval A into theinterval B. Then X

A if and only if Y

B. Hence P r{X

A} = P r{Y

B} ,

which means that the two shaded areas in the figure must be identical. This requirementcompletely specifies fY.

For the mathematical details we need to consider an infinitesimally small interval A .Then P r{X A} = fX(x)l(A), where l(A) denotes the length of A and x is anypoint in A . Similarly, P r{Y B} = fY(y)l(B) where y = g(x). Hence fY fulfillsfX(x)l(A) =fY(y)l(B) .

x

fX(x)

y= g(x)

fY(y) A

B

Figure 2.13: Finding the density of Y =g(X) from that of X. Shadedsurfaces have the same area.

The last ingredient is the fact that the absolute value of the slope of g at x is the ratiol(B)l(A)

. (We are still assuming infinitesimally small intervals.) Hence fY(y)|g(x)|= fX(x)

8/12/2019 PDC_Chap2

35/74

2.B. Densities after One-To-One Differentiable Transformations 51

and after solving for fY(y) and using x= g1(y) we obtain the desired result

fY(y) =fX(g

1(y))

|g(g1(y)|. (2.19)

Example 32. If g(x) =ax + b then fY(y) = fX(yb

a )|a| .

Example 33. The density fX in Figure2.13is Rayleigh, specifically

fX(x) =

x exp{x2

2}, x 0

0, otherwise

and let Y =g(X) =X2 . Then

fY(y) = 0.5exp{ y

2}, y 0

0, otherwise

Before generalizing, let us summarize the scalar case. Expression (2.19) says that locallythe shape of fY is the same as that of fXbut there is a denominator on the right thatacts locally as a scaling factor. The absolute value of the derivative of g in a point x isthe local slope of g and it tells us how g scales intervals around x . The larger the slopeat a point the larger the magnification of an interval around that point. As the integralof fXover an interval around x must be the same as that of fY over the correspondinginterval around y = g(x), if we scale up the interval size, we have to scale down the

density by the same factor.

Next we consider the multidimensional case, starting with two dimensions. Let X =(X1, X2)

T have pdf fX(x) and consider first the random vector Y obtained from theaffine transformation

Y =AX+ b

for some non-singular matrix A and vector b . The procedure to determine fY parallelsthat for the scalar case. If A is a small rectangle, small enough that fX(x) can beconsidered as constant for all X A , then P r{X A} is approximated by fX(x)a(A) ,where a(A) is the area ofA . IfBis the image ofA , then

fY(y)a(B) fX(x)a(A) as a(A) 0.

Hence

fY(y) fX(x)a(A)a(B)

as a(A) 0.

For the next and final step, we need to know that A maps A of area a(A) into surface Bof area a(B) =a(A)| det A| . So the absolute value of the determinant of a matrix is theamount by which areas scale through the affine transformation associated to the matrix.

8/12/2019 PDC_Chap2

36/74

52 Chapter 2.

This is true in any dimension n , but for n =1 we speak of length rather than area and forn 3 we speak of volume. (For the one-dimensional case, observe that the determinantof a is a ). See Lemma31 Appendix2.Afor an outline of the proof of this importantgeometrical interpretation of the determinant of a matrix. Hence

fY(y) =fX

A1(y b)| det A|

.

We are ready to generalize to a function g : Rn Rn which is one-to-one and differen-tiable. Write g(x) = (g1(x), . . . , gn(x)) and define its Jacobian J(x) to be the matrixthat has gi

xjat position i, j . In the neighborhood of x the relationship y = g(x) may

be approximated by means of an affine expression of the form

y= Ax + b

where A is precisely the Jacobian J(x) . Hence, leveraging on the affine case, we canimmediately conclude that

fY(y) = fX(g

1(y))

| det J(g1(y))| (2.20)

which holds for any n .

Sometimes the new random vector Y is described by the inverse function, namely X=g1(Y) (rather than the other way around, as assumed so far). In this case there is noneed to find g . The determinant of the Jacobian of g at x is one over the determinantof the Jacobian of g1 at y= g(x) .

As a final note we mention that if g is a many-to-one map, then for a specific y thepull-back g1(y) will be a set {xi . . . , xk} for some k . In this case the right side of (2.20)

will be

ifX (g(xi))|det J(xi)|

.

Example 34. (Rayleigh distribution) Let X1 and X2 be two independent, zero-mean,unit-variance, Gaussian random variables. Let R and be the corresponding polarcoordinates, i.e., X1 = R cos and X2 = R sin . We are interested in the probabilitydensity functions fR, , fR , and f . Because we are given the map g from (r, ) to(x1, x2) , we pretend that we know fR, and that we want to find fX1,X2. Thus

fX1,X2

(x1, x

2) =

1

| det J|f

R,(r, )

where J is the Jacobian of g , namely

J=

cos r sin sin r cos

.

Hence det J=r and

fX1,X2(x1, x2) =1

rfR,(r, ).

8/12/2019 PDC_Chap2

37/74

2.C. Gaussian Random Vectors 53

Using fX1,X2(x1, x2) = 12

exp{x21+x222 } and x21 + x

22 = r

2 to make it a function of thedesired variables r, , and solving for fR, we immediately obtain

fR,(r, ) = r

2exp

r2

2 .Since fR,(r, ) depends only on r , we infer that R and are independent randomvariables and that the latter is uniformly distributed in [0, 2) . Hence

f() =

1

2 [0, 2)

0 otherwise

and

fR(r) =re r

2

2 r

0

0 otherwise.

We come to the same conclusion by integrating fR, over to obtain fR and by inte-grating over r to obtain f . Notice that fR is a Rayleigh probability density.

Appendix 2.C Gaussian Random Vectors

A Gaussian random vector is a collection of jointly Gaussian random variables. We learnto use vector notation as it simplifies matters significantly.

Recall that a random variable W is a mapping from the sample space to R . W is aGaussian random variable with mean m and variance 2 if and only if its probabilitydensity function (pdf) is

fW(w) = 1

22exp

(w m)

2

22

.

Because a Gaussian random variable is completely specified by its mean m and variance2 , we use the short-hand notation N(m,2) to denote its pdf. Hence W N(m,2) .An n -dimensional random vector( n -rv) X is a mapping X : Rn . It can be seen asa collection X= (X1, X2, . . . , X n)

T ofn random variables. The pdf of Xis the joint pdfof X1, X2, . . . , X n . The expected value of X, denoted by EX or by X, is the n -tuple(EX1,EX2, . . . ,EXn)

T . The covariance matrix of X is KX = E[(X X)(X X)T] .Notice that XXT is an n n random matrix, i.e., a matrix of random variables, andthe expected value of such a matrix is, by definition, the matrix whose components arethe expected values of those random variables. Notice that a covariance matrix is alwaysHermitian.

8/12/2019 PDC_Chap2

38/74

54 Chapter 2.

The pdf of a vector W= (W1, W2, . . . , W n)T that consists of independent and identically

distributed (iid) N(0, 2) components is

fW(w) =n

i=1

122

expw2i22 (2.21)

= 1

(22)n/2exp

w

Tw

22

. (2.22)

The following is one of several possible ways to define a Gaussian random vector.

Definition 35. The random vector Y Rm is a zero-mean Gaussian random vectorand Y1, Y2, . . . , Y n are zero-mean jointly Gaussian random variables if and only if thereexists a matrix A Rmn such that Ycan be expressed as

Y =AW (2.23)

where W is a random vector of iid N(0, 1) components.Note 36. It follows immediately from the above definition that linear combinations ofzero-mean jointly Gaussian random variables are zero-mean jointly Gaussian random vari-ables. Indeed, Z=BY =BAW.

Recall from Appendix 2.B that if Y = AW for some non-singular matrix A Rnn ,then

fY(y) =fW(A

1y)

| det A| .

When W has iid N(0, 1) components,

fY(y) =exp

(A1y)T(A1y)

2

(2)n/2| det A|

.

The above expression can be simplified and brought to the standard expression

fY(y) = 1

(2)n det KYexp

1

2yTK1Y y

(2.24)

using KY =E

AW(AW)

T

=E

AW W

T

A

T

=AInA

T

=AA

T

to obtain(A1y)T(A1y) =yT(A1)TA1y

=yT(AAT)1y

=yTK1Y y

and det KY =

det AAT =

det A det A= | det A|.

8/12/2019 PDC_Chap2

39/74

2.C. Gaussian Random Vectors 55

Fact37. Let Y Rn be a zero-mean random vector with arbitrary covariance matrixKYand pdf as in (2.24). As a covariance matrix is Hermitian, we can write (see Appendix2.A)

KY =UU (2.25)

where U is unitary and is diagonal. It is immediate to verify that UW hascovariance KY. This shows that an arbitrary zero-mean random vector Y with pdf asin (2.24) can always be written in the form Y = AW where W has iid N(0, In)components.

The contrary is not true in degenerated cases. We have already seen that (2.24) followsfrom (2.23) when A is a non-singular squared matrix. The derivation extends to anynon-squared matrix A , provided that it has linearly independent rows. This result isderived as a homework exercise. In that exercise we also see that it is indeed necessarythat the rows of A be linearly independent as otherwise KY is singular and K

1Y is not

defined. Then (2.24) is not defined either. An example will show how to handle suchdegenerated cases.

Note that many authors use (2.24) to define a Gaussian random vector. We favor (2.23)because it is more general, but also as it makes it straightforward to prove a number ofkey results associated to Gaussian random vectors. Some of these are dealt with in theexamples below.

In any case, a zero-mean Gaussian random vector is completely characterized by its co-variance matrix. Hence the short-hand notation YN(0, KY) .Note 38. (Degenerate case) Let W N(0, 1) , A = (1, 1)T , and Y = AW. By ourdefinition, Y is a Gaussian random vector. However, A is a matrix of linearly dependentrows implying that Y has linearly dependent components. Indeed Y1 = Y2 . This alsoimplies that KY is singular: it is a 2 2 matrix with 1 in each component. As alreadypointed out, we cannot use (2.24) to describe the pdf of Y. This immediately raisesthe question: How do we compute the probability of events involving Y if we do notknow its pdf? The answer is easy. Any event involving Y can be rewritten as anevent involving Y1 only (or equivalently involving Y2 only). For instance, the event{Y1 [3, 5]} {Y2 [4, 6]} occurs if and only if {Y1 [4, 5]} . Hence

P r

Y1 [3, 5]

Y2 [4, 6]

=P r

Y1 [4, 5]

=Q(4) Q(5).

Exercise 39. Show that the i th component Yi of a Gaussian random vector Y is aGaussian random variable.

Solution: Yi = AY when A = eT

i is the unit row vector with 1 in the i th componentand 0 elsewhere. Hence Yi is a Gaussian random variable. To appreciate the convenienceof working with (2.23) instead of (2.24), compare this answer with the tedious derivationconsisting of integrating over fY to obtain fYi (see Problem16).

8/12/2019 PDC_Chap2

40/74

56 Chapter 2.

Exercise 40. Let Ube an orthogonal matrix. Determine the pdf of Y =U W.

Solution: Y is zero-mean and Gaussian. Its covariance matrix is KY = UKWUT =

U2InUT = 2U UT = 2In , where In denotes the n n identity matrix. Hence, when

an n -dimensional Gaussian random vector with iid

N(0, 2) components is projected

onto n orthonormal vectors, we obtain n iidN(0, 2) random variables. This resultwill be used often.

Exercise41. (Gaussian random variables are not necessarily jointly Gaussian) Let Y1N(0, 1) , let X {1} be uniformly distributed, and let Y2=Y1X. Notice that Y2 hasthe same pdf as Y1 . This follows from the fact that the pdf of Y1 is an even function.Hence Y1 and Y2 are both Gaussian. However, they are not jointly Gaussian. We cometo this conclusion by observing that Z=Y1+ Y2 = Y1(1 + X) is 0 with probability 1/2.Hence Zcannot be Gaussian.

Exercise42. Is it true thatuncorrelated Gaussianrandom variables are always indepen-

dent? If you think it is . . . think twice. The construction above labeled Gaussian randomvariables are not necessarily jointly Gaussian provides a counter example (you should beable to verify without much effort). However, the statement is true if the random variablesunder consideration arejointly Gaussian (the emphasis is on jointly). You should beable to prove this fact using (2.24). The contrary is always true: random variables (notnecessarily Gaussian) that are independent are always uncorrelated. Again, you shouldbe able to provide the straightforward proof.

Definition 43. The random vector Y is a Gaussian random vector (and Y1, . . . , Y nare jointly Gaussian random variables) if and only if Y m is a zero mean Gaussianrandom vector as defined above, where m = EY . If the covariance KY is non-singular

(which implies that no component of Y is determined by a linear combination of othercomponents), then its pdf is

fY(y) = 1(2)n det KY

exp

1

2(y Ey)TK1Y (y Ey)

.

8/12/2019 PDC_Chap2

41/74

2.D. A Fact About Triangles 57

Appendix 2.D A Fact About Triangles

In Example19we have derived the error probability for PSK using the fact that for atriangle with edges a , b , c and angles , , as shown in the left figure, the followingrelationship holds:

a

sin=

b

sin =

c

sin . (2.26)

a sina

c

ba

c

b b sin(180 )

To prove the first equality relating a and b we consider the distance between the vertex (common to a and b ) and its projection onto the extension of c . As shown in theleft figure, this distance may be computed two ways obtaining a sin and b sin(180) ,respectively. The latter may be written as b sin() . Hence a sin = b sin(), which isthe first equality. The second equality is proved similarly.

Appendix 2.E Spaces: Vector; Inner Product; Signal

2.E.1 Vector Space

Most readers are familiar with the notion of vector space form a linear algebra course.Unfortunately, some linear algebra courses for engineers associate vectors to n -tuplesrather than taking the axiomatic point of view which is what we need. Hence we reviewthe vector space axioms.

To Be Added: The axioms of a vectors space and and a few examples.

While in Chapter2all vector spaces are ofn -tuples over R , in later chapters we deal withthe vector space of n -tuples over C and the vector space of finite-energy complex-valuedfunctions. In this appendix we consider general vector spaces over the field of complex

numbers. Thy are commonly called complex vector spaces. Vector spaces for which thescalar field is R are called real vector spaces.

2.E.2 Inner Product Space

Given a vector space and nothing more, one can introduce the notion of a basis for thevector space, but one does not have the tool needed to define an orthonormal basis.

8/12/2019 PDC_Chap2

42/74

58 Chapter 2.

Indeed the axioms of a vector space say nothing about geometric ideas such as lengthor angle. To remedy, one endows the vector space with the notion of inner product.

Definition 44. Let V be a vector space over C . An inner product on V is a functionthat assigns to each ordered pair of vectors , in Va scalar, in C in such a waythat for all , , in Vand all scalars c in

C

(a) +, = , + , c, =c, ;

(b) , = , ; (Hermitian Symmertry)(c) , 0 with equality if and only if = 0.

It is implicit in (c) that, is real for all V. From (a) and (b), we obtain anadditional property

(d) , + = , + , , c =c, .

Notice that the above definition is also valid for a vector space over the field of realnumbers, but in this case the complex conjugates appearing in (b) and (d) are superfluous.However, over the field of complex numbers they are otherwise necessary for any = 0we could write

0< i, i = 1,

8/12/2019 PDC_Chap2

43/74

2.E. Spaces: Vector; Inner Product; Signal 59

By means of the inner product, we introduce the notion of length, called norm, of avector , via

=

,.

Using linearity, we immediately obtain that the squared normsatisfies

2 = , = 2 + 2 2Re{,}. (2.27)

The above generalizes (ab)2 =a2 +b22ab , a, b R , and |ab|2 =|a|2+|b|22Re{ab} ,a, b C .

Theorem 46. If V is an inner product space, then for any vectors , in V and anyscalar c ,

(a) c= |c|

(b) 0 with equality if and only if = 0

(c) |,| with equality if and only if = c for some c .(Cauchy-Schwarz inequality)

(d) + + with equality if and only if = c for some non-negativec R .(Triangle inequality)

(e) +2 + 2 = 2(2 + 2)(Parallelogram equality)

Proof. Statements (a) and (b) follow immediately from the definitions. We postpone theproof of the Cauchy-Schwarz inequality to Example50as at that time we will be able tomake a more elegant proof based on the concept of a projection. To prove the triangleinequality we use (2.27) and the Cauchy-Schwarz inequality applied to Re{,} |, | to prove that + 2 ( + )2 . Notice that Re{, } |, | holdswith equality if and only if = c for some non-negative c R . Hence this conditionis necessary for the triangle inequality to hold with equality. It is also sufficient sincethen also the Cauchy-Schwarz inequality holds with equality. The parallelogram equalityfollows immediately from (2.27) used twice, once with each sign.

+

Triangle inequality

+

Parallelogram equality

8/12/2019 PDC_Chap2

44/74

60 Chapter 2.

At this point we could use the inner product and the norm to define the angle betweentwo vectors but we do not have any use for this. Instead, we will make frequent use of thenotion of orthogonality. Two vectors and are defined to be orthogonalif , = 0 .

Example 47. This example is relevant for what we do from Chapter 3on. Let W =

{w0(t), . . . , wm1(t)} be a finite collection of functinos fromR

toC

such that |w(t)|2dt for all elements ofW. Let Vbe the complex vector space spanned by the elements of

W. The reader should verify that the axioms of a vector space are fulfilled. The standardinner product for functions from R to C is defined as

,=

(t)(t)dt

which implies the norm

=

|(t)|2dt,

but it is not a given that this is an inner product on V. It is straightforward to verify thatthe innner product axioms (a), (c) and (d) (Definition 44) are fulfilled for all elements ofV but axiom (b) is not necessarily fulfilled (see Example48). If we set the extra conditonthat for all V, , = 0 implies that is the zero vector, then V endowed with, forms an inner product space. All we have said in this example applies also for thereal vector spaces spanned by functions from R to R .

Example48. Let Vbe the set of functions from R toR spanned by the function that iszero everywhere, except at 0 where it takes value 1 . It can easily be checked that this isa vector space. It contains all the functions that are zero everywhere, except at 0 wherethey can take on any value in R . Its zero vector is the function that is 0 everywhere,including at 0 . For all in V, , = 0 . Hence , is not an inner product onV.

Theorem 49. (Pythagoras Theorem) If and are orthogonal vectors in V, then

+2 =2 + 2.

Proof. Pythagoras theorem follows immediately from the equality + 2 = 2 +2 + 2Re{,} and the fact that , = 0 by definition of orthogonality.

Given two vectors , V, = 0, we define the projection of on as the vector

| collinear to (i.e. of the form c for some scalar c ) such that = | isorthogonal to . Using the definition of orthogonality, what we want is

0 =, = c,= , c2.

Solving for c we obtain c= ,2

. Hence

| = ,

2 and = |.

8/12/2019 PDC_Chap2

45/74


The projection of on does not depend on the norm of . This is clear from thedefinition of projection. Alternatively, it can be verified by letting =bfor some b Cand by verifying that

|=, b

b2b=

,

2=|.

In particular, when the vector onto which we project has unit norm we obtain

|= |,|,

which tells us that |,| has the geometric interpretation of being the length of theprojection of onto the subspace spanned by .

|

Projection of on

Any non-zero vector V defines a hyperplaneby the relationship

{ V :, = 0} .

The hyperplane is the set of vectors in Vthat are orthogonal to . A hyperplane alwayscontains the zero vector.

Anaffine plane, defined by a vector and a scalar c , is an object of the form{ V :, = c} .

The vector and scalar c that define a hyperplane are not unique, unless we agree thatwe use only normalized vectors to define hyperplanes. By letting =

, the above

definition of affine plane may equivalently be written as {V : ,= c

} or even

as { V : c

, = 0} . The first shows that at an affine plane is the set of

vectors that have the same projection c

on (see the next figure). The second formshows that the affine plane is a hyperplane translated by the vector c

. Some authors

make no distinction between affine planes and hyperplanes; in this case both are called

hyperplane.

Affine plane defined by .

8/12/2019 PDC_Chap2

46/74

62 Chapter 2.

In the example that follows, we use the notion of projection to prove the Cauchy-Schwarzinequality stated in Theorem46.

Example50. (Proof of the Cauchy-Schwarz Inequality). The Cauchy-Schwarz inequalitystates that for any ,V, |, | with equality if and only if =c for

some scalar c C

. The statement is obviously true if = 0 . Assume = 0 andwrite = | +. (See the next figure.) Pythagoras theorem states that 2 =

|2 +

2 . If we drop the second term, which is always non-negative, we obtain2 |

2 with equality if and only if and are collinear. From the definition of

projection, |2 = |,|

2

2 . Hence 2 |,|

2

2 with equality if and only if and

are collinear. This is the Cauchy-Schwarz inequality.

|

,

The Cauchy-Schwarz inequality

Every finite-dimensional vector space has a basis. If 1, 2, . . . ,n is a basis for the innerproduct space V and Vis an arbitrary vector, then there are scalars a1, . . . , an suchthat =

aii but finding them may be difficult. However, finding the coefficients of a

vector is particularly easy when the basis is orthonormal.

A basis 1,2, . . . ,n for an inner product space V is orthonormal if

i,j=

0, i=j

1, i= j.

Finding the i th coefficient ai of an orthonormal expansion =

aii is immediate. Itsuffices to observe that all but the i th term of

aii are orthogonal to i and that the

inner product of the i th term with i yields ai . Hence if =

aii then

ai=

,

i.

Observe that |ai| is the norm of the projection of on i . This should not be surprisinggiven that the i th term of the orthonormal expansion of is collinear to i and the sumof all the other terms are orthogonal to i .

There is another major advantage to working with an orthonormal basis. If a and bare the n -tuples of coefficients of the expansion of and with respect to the sameorthonormal basis, then

,= a, b

8/12/2019 PDC_Chap2

47/74

8/12/2019 PDC_Chap2

48/74

64 Chapter 2.

Theorem 52. Let Vbe an inner product space and let 1, . . . ,n be any collection oflinearly independent vectors in V. Then we may construct orthogonal vectors1, . . . ,nin Vsuch that they form a basis for the subspace spanned by 1, . . . ,n .

Proof. The proof is constructive via a procedure known as the Gram-Schmidt orthogo-

nalization procedure. First let 1=1 . The other vectors are constructed inductively asfollows. Suppose 1, . . . ,m have been chosen so that they form an orthogonal basis forthe subspace Um spanned by 1, . . . ,m . We choose the next vector as

m+1 = m+1 m+1|Um , (2.28)

where m+1|Um is the projection of m+1 on Um . By definition, m+1 is orthogonal toevery vector in Um , including 1, . . . ,m . Also, m+1= 0 for otherwise m+1 contradictsthe hypothesis that it is linearly independent of 1, . . . ,m . Therefore 1, . . . ,m+1 is anorthogonal collection of non-zero vectors in the subspace Um+1 spanned by 1, . . . ,m+1 .Therefore it must be a basis for Um+1 . Thus the vectors 1, . . . ,n may be constructed

one after the other according to (2.28).

Corollary 53. Every finite-dimensional vector space has an orthonormal basis.

Proof. Let 1, . . . ,n be a basis for the finite-dimensional inner product space V. Applythe Gram-Schmidt procedure to find an orthogonal basis 1, . . . ,n . Then 1, . . . ,n ,where i=

ii

, is an orthonormal basis.

Gram-Schmidt Orthonormalization Procedure

We summarize the Gram-Schmidt procedure, modified so as to produce orthonormalvectors. If 1, . . . ,n is a linearly independent collection of vectors in the inner productspace V, then we may construct a collection 1, . . . ,n that forms an orthonormal basisfor the subspace spanned by 1, . . . ,n as follows: We let 1 =

11

and for i= 2, . . . , nwe choose

i=i i1j=1

i,jj

i= i

i.

We have assumed that 1, . . . ,n is a linearly independent collection. Now assume thatthis is not the case. If j is linearly dependent of 1, . . . ,j1 , then at step i = j theprocedure will produce i=i= 0. Such vectors are simply disregarded.

2.E.3 Signal Space

The following table gives an example of the Gram-Schmidt procedure applied to a set ofsignals.

8/12/2019 PDC_Chap2

49/74


i i i,j i|Vi1 i = i i|Vi1 i i ij < i

1

1

1 - -

1

1 2

1

1

20

0

2

1

11

1

1

1

11

1

1

11

0

3

1

20, 0

1

1

1

22

1

2

002

Table 2.1: Application of the Gram-Schmidt orthonormalization procedure

starting with the waveforms given in the first column.

8/12/2019 PDC_Chap2

50/74

66 Chapter 2.

Appendix 2.F Exercises

Problem 1. (Probabilities of Basic Events) Assume that X1 and X2 are independentrandom variables that are uniformly distributed in the interval [0, 1] . Compute the prob-ability of the following events. Hint: For each event, identify the corresponding regioninside the unit square.

(a) 0 X1 X2 13.

(b) X31 X2X21.

(c) X2 X1= 12.

(d) (X1 12

)2 + (X2 12

)2 ( 12

)2 .

(e) Given that X1 14, compute the probability that (X1 12 )2 + (X2 12 )2 ( 12 )2 .

Problem 2. (Basic Probabilities) Find the following probabilities.

(a) A box contains m white and n black balls. Suppose k balls are drawn. Find theprobability of drawing at least one white ball.

(b) We have two coins; the first is fair and the second is two-headed. We pick one of thecoins at random, we toss it twice and heads shows both times. Find the probability

that the coin is fair.

Problem 3. (Conditional Distribution) Assume that X and Y are random variableswith joint probability density function

fX,Y(x, y) =

A, 0 x < y 10, otherwise.

(a) Are X and Y independent?

(b) Find the value of A .

(c) Find the marginal distribution of Y . Do this first by arguing geometrically thencompute it formally.

(d) Find (y) = E [X|Y =y] . Hint: Argue geometrically.

(e) Find E [(Y)] using the marginal distribution of Y.

(f) Find E [X] and show that E [X] = E [E [X|Y]] .

8/12/2019 PDC_Chap2

51/74

2.F. Exercises 67

Problem4. (Playing Darts)Assume that you are throwing darts at a target. We assumethat the target is one-dimensional, i.e., that the darts all end up on a line. The bullseye is in the center of the line, and we give it the coordinate 0 . The position of a darton the target can then be measured with respect to 0 . We assume that the position X1of a dart that lands on the target is a random variable that has a Gaussian distribution

with variance21 and mean 0 . Assume now that there is a second target, which is furtheraway. If you throw a dart to that target, the position X2 has a Gaussian distributionwith variance 22 (where

22 >

21) and mean 0 . You play the following game: You toss a

coin which gives you Z= 1 with probability p and Z= 0 with probability 1 p forsome fixed p [0, 1] . If Z= 1 , you throw a dart onto the first target. If Z= 0 , you aimfor the second target instead. Let Xbe the relative position of the dart with respect tothe center of the target that you have chosen.

(a) Write down X in terms of X1 , X2 and Z.

(b) Compute the variance of X. Is X Gaussian?(c) Let S=|X| be the score, which is given by the distance of the dart to the center of

the target (that you picked using the coin). Compute the average score E[S] .

Problem 5. (Uncorrelated vs. Independent Random Variables) Let X and Y be twocontinuous real-valued random variables with joint probability density function fXY.

(a) Show that if X and Y are independent, they are also uncorrelated.

(b) Consider two independent and uniformly distributed random variables U {0, 1}and V {0, 1} . Assume that X and Y are defined as follows: X = U+V andY =|U V| . Are X and Y independent? Compute the covariance of X and Y .What do you conclude?

Problem 6. (One of Three) Assume you are participating in a quiz show. You areshown three boxes that look identical from the outside, except they have labels 0, 1, and2, respectively. Only one of them contains one million Swiss francs, the other two containnothing. You choose one box at random with a uniform probability. Let A be the randomvariable that denotes your choice, A {0, 1, 2} .

(a) What is the probability that the box A contains the money?

The quizmaster knows in which box the money is and he now opens, from the re-maining two boxes, the one that does not contain the prize. This means that ifneither of the two remaining boxes contain the prize then the quizmaster opens onewith uniform probability. Otherwise, he simply opens the one that does not containthe prize. Let B denote the random variable corresponding to the box that remainsclosed after the elimination by the quizmaster.

8/12/2019 PDC_Chap2

52/74

68 Chapter 2.

(b) What is the probability that B contains the money?

(c) If you are now allowed to change your mind, i.e., choose B instead of sticking withA , would you do it?

Problem 7. (Hypothesis Testing: Uniform and Uniform)Consider a binary hypothesistesting problem in which the hypothesesH= 0 andH= 1 occur with probabilityPH(0)and PH(1) = 1 PH(0) , respectively. The observable Y takes values in {1}2k , wherek is a fixed positive integer. When H = 0 , each component of Y is 0 or a 1 withprobability 1

2and components are independent. When H= 1 , Y is chosen uniformly at

random from the set of all sequences of length 2k that have an equal number of ones andzeros. There are

2kk

such sequences.

(a) What is PY|H(y|0) ? What is PY|H(y|1) ?

(b) Find a maximum likelihood decision rule for H based on y . What is the singlenumber you need to know about y to implement this decision rule?

(c) Find a decision rule that minimizes the error probability.

(d) Are there values of PH(0) such that the decision rule that minimizes the error prob-ability always for one of the two hypotheses regardless of y ? If yes, what are thesevalues, and what is the decision?

Problem8. (The Wetterfrosch)Let us assume that a weather frog bases his forecastof tomorrows weather entirely on todays air pressure. Determining a weather forecastis a hypothesis testing problem. For simplicity, let us assume that the weather frog onlyneeds to tell us if the forecast for tomorrows weather is sunshine or rain. Hence weare dealing with binary hypothesis testing. Let H= 0 mean sunshine andH= 1 meanrain. We will assume that both values ofHare equally likely, i.e. PH(0) =PH(1) =

12.

Measurements over several years have led the weather frog to conclude that on a daythat precedes sunshine the pressure may be modeled as a random variable Y with thefollowing probability density function:

fY|H(y|0) =

A A

2 y, 0 y 10, otherwise.

Similarly, the pressure on a day that precedes a rainy day is distributed according to

fY|H(y|1) =

B+ B

3y, 0 y 1

0, otherwise.

The weather frogs purpose in life is to guess the value of Hafter measuring Y .

8/12/2019 PDC_Chap2

53/74

2.F. Exercises 69

(a) Determine A and B .

(b) Find thea posteriori probability PH|Y(0|y) . Also find PH|Y(1|y) .

(c) Plot PH|Y(0|y) and PH|Y(1|y) as a function of y . Show that the implementation of

the decision ruleH(y) = arg maxi PH|Y(i|y) reduces to

H(y) =

0, ify 1, otherwise,

(2.29)

for some threshold and specify the thresholds value. Do so by direct calculation,rather than using the general result (2.4).

(d) Now assume that you implement the decision ruleH(y) and determine, as a function

of , the probability that the decision rule decidesH= 1 given that H= 0 . Thisprobability is denoted P r{H(Y) = 1|H= 0} .

(e) For the same decision rule, determine the probability of error Pe() as a function of. Evaluate your expression at = .

(f) Using calculus, find thethat minimizesPe() and compare your result to . Couldyou have found the minimizing without any calculation?

Problem 9. (Hypothesis Testing in Laplacian Noise) Consider the following hypothesistesting problem between two equally likely hypotheses. Under hypothesis H = 0 , theobservableY is equal toa + Z whereZ is a random variable with Laplacian distribution

fZ(z) = 1

2e|z|.

Under hypothesis H= 1 , the observable is given by a + Z. You may assume that a ispositive.

(a) Find and draw the density fY|H(y|0) of the observable under hypothesis H= 0 , andthe density fY|H(y|1) of the observable under hypothesis H= 1 .

(b) Find the decision rule that minimizes the probability of error. Write out the expres-

sion for the likelihood ratio.

(c) Compute the probability of error of the optimal decision rule.

Problem10. (Poisson Parameter Estimation)In this example there are two hypotheses,H = 0 and H = 1 , which occur with probabilities PH(0) = p0 and PH(1) = 1 p0 ,

8/12/2019 PDC_Chap2

54/74

8/12/2019 PDC_Chap2

55/74

2.F. Exercises 71

(d) Repeat (a) and (b) for a general n . Hint: There is no need to repeat every step ofyour previous derivations.

Problem

12.

(Fault Detector) As an engineer, you are required to design the test per-formed by a fault-detector for a black-box that produces a a sequence of i.i.d. bi-nary random variables , X1, X2, X3, . Previous experience shows that this blackbox has an a priori failure probability of 1

1025. When the black box works properly,

pXi(1) = p . When it fails, the output symbols are equally likely to be 0 or 1 . Yourdetector has to decide based on the observation of the past 16 symbols, i.e., at time kthe decision will be based on Xk16, . . . , X k1 .

(a) Describe your test.

(b) What does your test decide if it observes the output sequence 0101010101010101 ?Assume that p= 0.25 .

Problem 13. (MAP Decoding Rule: Alternative Derivation)Consider the binary hy-pothesis testing problem where H takes values in {0, 1} with probabilites PH(0) andPH(1) and the conditional probability density function of the observation Y R givenH=i , i {0, 1} is given by fY|H(|i) . Let Ri be the decoding region for hypothesis i ,

i.e., the set of y for which the decision is H=i , i {0, 1} .

(a) Show that the probability of error is given by

Pe=PH(1) +

R1

PH(0)fY|H(y|0) PH(1)fY|H(y|1)

dy.

Hint: Note that R =R0R1 and

R

fY|H(y|i)dy = 1 for i {0, 1} .

(b) Argue that Pe is minimized when

R1={y R :PH(0)fY|H(y|0)< PH(1)fY|H(y|1)}

i.e., for the MAP rule.

Problem 14. (One Bit over a Binary Channel with Memory) Consider communicatingone bit via n uses of a binary channel with memory. The channel output Yi at timeinstant i is given by

Yi=Xi Zi i= 1, . . . , n

8/12/2019 PDC_Chap2

56/74

72 Chapter 2.

where Xi is the binary channel input, Zi is the binary noise and represents modulo 2addition. All random variables take value in {0, 1} . The noise sequence is generated asfollows: Z1 is generated from the distribution PZ1(1) =p and for i >1 ,

Zi=Zi

1

Ni

whereN2, . . . , N n are i.i.d. with probabilityPN(1) =p . Let the codewords (the sequence

of symbols sent on the channel) corresponding to message 0 and 1 be (X(0)1 , . . . , X (0)n )

and (X(1)1 , . . . , X

(1)n ) , respectively.

(a) Consider the following operation by the receiver. The receiver creates the vector(Y1,Y2, . . . ,Yn)

T where Y1 = Y1 and for i = 2, 3, . . . , n , Yi = Yi Yi1 . Arguethat the vector created by the receiver is a sufficient statistic. Hint: Show that(Y1, Y2, . . . , Y n)

can be reconstructed from (Y1,Y2, . . . ,Yn) .

(b) Write down (Y1,Y2, . . . ,Yn)

for each of the hypotheses. Notice the similarity withthe problem of communicating one bit via n uses of a binary symmetric channel.

(c) How should the receiver decide between (X(0)1 , . . . , X (0)n ) and (X

(1)1 , . . . , X

(1)n ) so as

to minimize the probability of error?

Problem15. (Independent and Identically Distributed versus First-Order Markov) Con-sider testing two equally likely hypotheses H = 0 and H = 1 . The observable Y =(Y1, . . . , Y k)

T is a k -dimensional binary vector. Under H = 0 the components of thevector Y are independent uniform random variables (also called Bernoulli(1/2) random

variables). Under H = 1 , the component Y1 is also uniform, but the components Yi ,2 i k , are distributed as follows:

PYi|Y1,...,Yi1(yi|y1, . . . , yi1) =

3/4, ifyi= yi11/4, otherwise.

(2.32)

(i) Find the decision rule that minimizes the probability of error. Hint: Write down ashort sample sequence (y1, . . . , yk) and determine its probability under each hypothesis.Then generalize.

(ii) Give a simple suffi

cient statistic for this decision.(iii)Suppose that the observed sequence alternates between 0 and1 exceptfor one stringof ones of length s , i.e. the observed sequence y looks something like

y = 0101010111111 . . . 111111010101 . . . . (2.33)

What is the least s such that we decide for hypothesis H= 1 ?

8/12/2019 PDC_Chap2

57/74

2.F. Exercises 73

Problem 16. (Real-Valued Gaussian Random Variables) For the purpose of this prob-lem, two zero-mean real-valued Gaussian random variables X and Y are calledjointlyGaussian if and only if their joint density is

fXY(x, y) = 1

2detexp

1

2 x, y 1 x

y , (2.34)

where (for zero-mean random vectors) the so-calledcovariance matrix is

= E

XY

(X, Y)

=

2X XYXY

2Y

. (2.35)

(a) Show that if X and Y are zero-mean jointly Gaussian random variables, then X isa zero-mean Gaussian random variable, and so is Y .

(b) Show that if X and Y are independent zero-mean Gaussian random variables, then

X and Y are zero-mean jointly Gaussian random variables.(c) However, if X and Yare Gaussian random variables but not independent, then X

and Y are not necessarily jointly Gaussian. Give an example where X and Y areGaussian random variables, yet they arenot jointly Gaussian.

(d) Let X and Y be independent Gaussian random variables with zero mean and vari-ance 2X and

2Y, respectively. Find the probability density function of Z=X+ Y .

Observe that no computation is required if we use the definition of jointly Gaussianrandom variables given in Appendix2.C.

Problem17. (Correlation versus Independence)Let Zbe a random variable with prob-ability density function

fZ(z) =

1/2, 1 z 10, otherwise.

Also, let X=Z and Y =Z2 .

(a) Show that X and Y are uncorrelated.

(b) Are X and Y independent?

(c) Now let X and Y be jointly Gaussian, zero mean, uncorrelated with variances 2Xand 2Y respectively. Are X and Y independent? Justify your answer.

Problem 18. (Uniform Polar to Cartesian) Let R and be independent random vari-ables. R is distributed uniformly over the unit interval, is distributed uniformly overthe interval [0, 2) .

8/12/2019 PDC_Chap2

58/74

74 Chapter 2.

(a) Interpret R and as the polar coordinates of a point in the plane. It is clear thatthe point lies inside (or on) the unit circle. Is the distribution of the point uniformover the unit disk? Take a guess!

(b) Define the random variables

X = R cos

Y = R sin.

Find the joint distribution of the random variables X and Yby using the Jacobiandeterminant.

(c) Does the result of part (2) support or contradict your guess from part (1)? Explain.

Problem19. (Sufficient Statistic)Consider a binary hypothesis testing problem specified

by:

H= 0 :

Y1=Z1Y2=Z1Z2

H= 1 :

Y1= Z1Y2= Z1Z2

where Z1 , Z2 and H are independent random variables. Is Y1 a sufficient statistic?Hint: If Y =aZ for some scalar a then fY(y) =

1|a|

fZ(ya

) .

Problem 20. (More on Sufficient Statistic) We have seen that if H T(Y) Y ,then the probability of error Pe of a MAP decoder that decides on the value of H uponobserving both T(Y) and Y is the same as that of a MAP decoder that observes onlyT(Y) . It is natural to wonder if the contrary is also true, specifically if the knowledgethat Y does not help reduce the error probability that we can achieve withT(Y) impliesH T(Y) Y . Here is a counter-example. Let the hypothesis H be either 0 or 1with equal probability (the choice of distribution on H is critical in this example). Letthe observable Y take four values with the following conditional probabilities

PY|H(y|0) =

0.4 ify= 0

0.3 ify= 10.2 ify= 20.1 ify= 3

PY|H(y|1) =

0.1 ify= 0

0.2 ify= 10.3 ify= 20.4 ify= 3

and T(Y) is the following function

T(y) =

0 ify= 0 ory= 11 ify= 2 ory= 3.

8/12/2019 PDC_Chap2

59/74

8/12/2019 PDC_Chap2

60/74

8/12/2019 PDC_Chap2

61/74

2.F. Exercises 77

a

0

b

Figure 2.14:

(b) For each signal constellation, compute the average energy per symbolEas a functionof the parameters a and b , respectively:

E=16i=1

PH(i)ci2 (2.39)

(c) Plot Pe versus Efor both signal constellations and comment.

Problem24. (Q-Function on Regions, problem from [1])Let X N(0, 2

I2) . For eachof the three figures below, express the probability that X lies in the shaded region. Youmay use the Q -function when appropriate.

12 x1

x2

x1

x2

2

2

1

1x1

x2

Figure 2.15:

Problem 25. (QPSK Decision Regions) Let H {0, 1, 2, 3} and assume that whenH = i you transmit the codeword ci shown in Figure2.16. Under H = i , the receiverobserves Y =ci+ Z.

8/12/2019 PDC_Chap2

62/74

78 Chapter 2.

c2 c0

c1

c3

x1

x2

Figure 2.16:

(a) Draw the decoding regions assuming that Z N(0, 2I2) and that PH(i) = 1/4 ,i

{0, 1, 2, 3} .

(b) Draw the decoding regions (qualitatively) assuming Z N(0, 2I) and PH(0) =PH(2)> PH(1) =PH(3) . Justify your answer.

(c) Assume again that PH(i) = 1/4 , i {0, 1, 2, 3} and that Z N(0, K) , whereK=

2 00 42

. How do you decode now?

Problem 26. (Properties of the Q Function) Prove properties (a) through (d) of the

Q function defined in Section 2.3. Hint: for property (d), multiple and divide inside theintegral by the integration variable and integrate by parts. By upper- and lowerboundingthe resulting integral, you will obtain the lower and upper bound.

Problem 27. (Antenna Array) The following problem relates to the design of multi-antenna systems. Consider the binary equiprobable hypothesis testing problem:

H= 0 : Y1 = A + Z1, Y2 = A + Z2

H= 1 : Y1 = A + Z1, Y2 = A + Z2,

where Z1, Z2 are independent Gaussian random variables with different variances 21=22, that is, Z1N(0, 21) and Z2N(0,22) . A >0 is a constant.

(a) Show that the decision rule that minimizes the probability of error (based on theobservable Y1 and Y2 ) can be stated as

22y1+ 21y2

0

1

0.

8/12/2019 PDC_Chap2

63/74

2.F. Exercises 79

(b) Draw the decision regions in the (Y1, Y2) plane for the special case where 1= 22 .

(c) Evaluate the probability of the error for the optimal detector as a function of 21, 22

and A .

Problem28. (Multiple Choice Exam) You are taking a multiple choice exam. Questionnumber 5 allows for two possible answers. According to your first impression, answer 1is correct with probability 1/4 and answer 2 is correct with probability 3/4 . You wouldlike to maximize your chance of giving the correct answer and you decide to have a lookat what your neighbors on the left and right have to say. The neighbor on the left hasansweredHL = 1 . He is an excellent student who has a record of being correct 90% ofthe time when asked a binary question. The neighbor on the right has answeredHR= 2 .He is a weaker student who is correct 70% of the time.

(a) You decide to use your first impression as a prior and to consider HL and HR asobservations. Formulate the decision problem as a hypothesis testing problem.

(b) What is your answerH?

Problem 29. (Multi-Antenna Receiver) Consider a communication system with onetransmitter and n receiver antennas. The n -tuple former output of antenna k , denotedby Yk , is modeled by

Yk=Bgk+ Zk, k= 1, 2, . . . , n

where B {1} is a uniformly distributed source bit, gk models the gain of antenna kandZkN(0, 2) . The random variablesB, Z1, . . . , Z n are independent. Usingn -tuplenotation the model becomes

Y =Bg+ Z,

where Y, g , and Z are n -tuples.

(a) Suppose that the observation Yk is weighted by an arbitrary real number wk andcombined with the other observations to form

V =

n

k=1 Ykwk= Y, w,

where w is an n -tuple. Describe the ML receiver for B given the observation V .(The receiver knows g and of course knows w .)

(b) Give an expression for the probability of error Pe .

(c) Define = |g,w|gw

and rewrite the expresson for Pe in a form that depends on wonly through .

8/12/2019 PDC_Chap2

64/74

80 Chapter 2.

(d) As a function of w , what are the maximum and minimum values for and how doyou choose w to achieve them?

(e) Minimize the probability of error over all possible choices of w . Could you reducethe error probability further by doing ML decision directly on Y rather than on V?

Justify your answer.

(f) How would you choose w to minimize the error probability if Zk had variance k ,k = 1, . . . , n ? Hint: With a simple operation at the receiver you can transform thenew problem into the one you have already solved.

Problem 30. (QAM with Erasure) Consider a QAM receiver that outputs a specialsymbol called erasure and is denoted by whenever the observation falls in the shadedarea shown in Figure2.17. Assume that c0 R2 is transmitted and that Y = c0+ Nis received where N

N(0, 2I2) . Let P0i , i = 0, 1, 2, 3 be the probability that the

receiver outputsH=i and let P0 be the probability that it outputs . Determine P00 ,P01 , P02 , P03 and P0 .

b ay1

y2

bc0c1

c2 c3

Figure 2.17:

Comment: If we choose b a large enough, we can make sure that the probability of theerror is very small (we say that an error occurred ifH=i, i {0, 1, 2, 3} and H= H).When H= , the receiver can ask for a retransmission of H. This requires a feedbackchannel from the receiver to the sender. In most practical applications, such a feedbackchannel is available.

Problem 31. (Repeat Codes and Bhattacharyya Bound) Consider two equally likelyhypotheses. Under hypothesis H= 0 , the transmitter sends c0 = (1, . . . , 1)

T and underH= 1 it sends c1= (1, . . . ,1)T , both of length n . The channel model is the AWGNwith variance2 in each component. Recall that the probability of error for a ML receiver

8/12/2019 PDC_Chap2

65/74

2.F. Exercises 81

that observes the channel output Y Rn is

Pe = Q

n

.

Suppose now that the decoder has accessonly to the sign of Yi

, 1

i

n . In otherwords, the observation is

W = (W1, . . . , W n) = (sign(Y1), . . . , sign(Yn)). (2.40)

(a) Determine the MAP decision rule based on the observableW. Give a simple sufficientstatistic, and draw a diagram of the optimal receiver.

(b) Find the expression for the probability of errorPe of the MAP decoder that observesW. You may assume that n is odd.

(c) Your answer to(b)contains a sum that cannot be expressed in closed form. Expressthe Bhattacharyya bound on Pe .

(d) Forn= 1, 3, 5, 7 , find the numerical values of Pe , Pe , and the Bhattacharyya boundon Pe .

Problem 32. (Tighter Union Bhattacharyya Bound: Binary Case)In this problem we derive a tighter version of theUnion Bhattacharyya Bound for binaryhypotheses. Let

H= 0 : Y

fY|H

(y|0)

H= 1 : Y fY|H(y|1).The MAP decision rule is

H(y) = arg maxi

PH(i)fY|H(y|i),

and the resulting probability of error is

Pe=PH(0)

R1

fY|H(y|0)dy+ PH(1)

R0

fY|H(y|1)dy.

(a) Argue that

Pe=

y

min

PH(0)fY|H(y|0), PH(1)fY|H(y|1)

dy.

(b) Prove that fora, b 0, min(a, b) ab a+b2

. Use this to prove the tighter versionof theBhattacharyya Bound, i.e,

Pe 12

y

fY|H(y|0)fY|H(y|1)dy.

8/12/2019 PDC_Chap2

66/74

82 Chapter 2.

(c) Compare the above bound to the one derived in class when PH(0) = 12. How do you

explain the improvement by a factor 12?

Problem 33. (Tighter Union Bhattacharyya Bound: M-ary Case) In this problem we

derive a tight version of the union bound for M-ary hypotheses. Let us analyze thefollowing M-ary MAP detector:

H(y) = smallest i such that

PH(i)fY|H(y|i) = maxj{PH(j)fY|H(y|j)}.

Let

Bi,j =

y:PH(j)fY|H(y|j) PH(i)fY|H(y|i), j < iy:PH(j)fY|H(y|j)> PH(i)fY|H(y|i), j > i

(a) Verify that Bi,j =Bcj,i .

(b) Given H = i , the detector will make an error if and only if yj:j=i Bi,j. Theprobability of error is Pe=

M1i=0 Pe(i)PH(i) . Show that:

Pe M1i=0

j>i

[P r{Bi,j|H=i}PH(i) + P r{Bji|H=j}PH(j)]

=M1i=0

j>i

Bi,j

fY|H(y|i)PH(i)dy+

Bci,j

fY|H(y|j)PH(j)dy

=

M1i=0

j>i

y min

fY|H(y|i)PH(i), fY|H(y|j)PH(j)

dy

Hint: Use the union bound and then group the terms corresponding toBi,j andBji .To prove the last part, go back to the definition ofBi,j.

(c) Hence show that:

PeM1i=0

j>i

PH(i) + PH(j)

2

y

fY|H(y|i)fY|H(y|j)dy

(Hint: For a, b

0, min(a, b)

ab

a+b

2 .)

Problem34. (Applying the Tight Bhattacharyya Bound) As an application of the tightBhattacharyya bound (Problem32), consider the following binary hypothesis testing prob-lem

H= 0 : YN(a, 2)H= 1 : YN(+a,2)

where the two hypotheses are equiprobable.

8/12/2019 PDC_Chap2

67/74

2.F. Exercises 83

(a) Use theTight Bhattacharyya Boundto derive a bound on Pe .

(b) We know that the probability of error for this binary hypothesis testing problem is

Q( a

) 12exp

a2

22

, where we have used the result Q(x) 1

2exp

x2

2

. How do

the two bounds compare? Comment the result.

Problem35. (Bhattacharyya Bound for DMCs) Consider a Discrete Memoryless Chan-nel (DMC). This is a channel model described by an input alphabet X, an output alpha-bet Y and a transition probability5 PY|X(y|x) . When we use this channel to transmit ann-tuple x Xn , the transition probability is

PY|X(y|x) =n

i=1

PY|X(yi|xi).

So far, we have come across two DMCs, namely the BSC (Binary Symmetric Channel)and the BEC (Binary Erasure Channel). The purpose of this problem is to see that forDMCs, theBhattacharyya Bound takes on a simple form, in particular when the channelinput alphabet Xcontains only two letters.

(a) Consider a transmitter that sends c0 Xn when H= 0 and c1 Xn when H= 1 .Justify the following chain of inequalities.

Pe(a)

yPY|X(y|c0)PY|X(y|c1)

(b)

y

ni=1

PY|X(yi|c0,i)PY|X(yi|c1,i)

(c)=

y1,...,yn

ni=1

PY|X(yi|c0,i)PY|X(yi|c1,i)

(d)=

y1

PY|X(y1|c0,1)PY|X(y1|c1,1) . . .

yn

PY|X(yn|c0,n)PY|X(yn|c1,n)

(e)

=

ni=1

y

PY|X(y|c0,i)PY|X(y|c1,i)

(f)=

aX,bX,a=b

y

PY|X(y|c0,i)PY|X(y|c1,i)

n(a,b).

where n(a, b) is the number of positions i in which c0,i=a and c1,i=b .

5Here we are assuming that the output alphabet is discrete. Otherwise we use densities instead ofprobabilities.

8/12/2019 PDC_Chap2

68/74

84 Chapter 2.

(b) The Hamming distance dH(c0, c1) is defined as the number of positions in whichc0 and c1 differ. Show that for a binary input channel, i.e, when X = {a, b} , theBhattacharyya Bound becomes

Pe

zdH(c0,c1),

where

z=

y

PY|X(y|a)PY|X(y|b).

Notice that zdepends only on the channel, whereas its exponent depends only onc0 and c1 .

(c) What is z for

(i) The binary input Gaussian channel described by the densities

fY|X(y|0) = N(E, 2

)fY|X(y|1) = N(

E, 2).

(ii) The Binary Symmetric Channel (BSC) with the transition probabilities de-scribed by

PY|X(y|x) =

1 , ify=x,, otherwise.

(iii) The Binary Erasure Channel (BEC) with the transition probabilities given by

PY|X(y|x) = 1 , ify = x,, ify = E

0, otherwise.

(iv) Consider a channel with input alphabet {1} , and output Y = sign(x+Z) ,wherex is the input andZ N(0, 2) . This is a BSC obtained from quantizinga Gaussian channel used with binary input alphabet. What is the crossoverprobability p of the BSC? Plot the zof the underlying Gaussian channel (withinputs in R ) and that of the BSC. By how much do we need to increase the inputpower of the quantized channel to match the zof the unquantized channel?

Problem36. (Bh

PDC_Chap2

Documents