Top Banner

of 30

PDC_Chap3

Jun 03, 2018

Download

Documents

em
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/12/2019 PDC_Chap3

    1/30

    Chapter 3

    Receiver Design for the Waveform

    AWGN Channel

    3.1 Introduction

    In Chapter 2 we focused on the receiver for the discrete-time AWGN (Additive WhiteGaussian Noise) channel. In this chapter, we address the same problem for a channelmodel closer to reality, namely the waveform AWGN channel. Apart from the channelmodel, the assumptions and the goal are the same: We assume that the source and thetransmitter are given to us and we seek to understand what the receiver has to do tominimize the error probability. We are also interested in the resulting error probability,

    but this follows from Chapter2 with no extra work. The setup is shown in Figure 3.1.

    Transmitter

    H=i H

    wi(t)

    N(t)

    R(t)

    Receiver H H

    Figure 3.1:Communication across the AWGN channel.

    The channel of Figure 3.1 captures the most important aspect of all channels, namelythe presence of additive noise. Due to the central limit theorem, the assumption that thenoise is Gaussian is often a very good one. In Section 3.5will discuss additional channelproperties that also affect the design and performance of a communication system.

    Example 54. A cable is a good example of a channel that can be modeled by thecontinuous-time AWGN channel. If the cables frequency response cannot be consid-

    91

  • 8/12/2019 PDC_Chap3

    2/30

    92 Chapter 3.

    ered as constant over the signals bandwidth, then the cables filtering effect needs alsobe taken into consideration. We discuss this in Section3.5. Another good example is thechannel seen between the antenna of a geostationary satellite and the antenna of thecorresponding earth station.

    Although our primary focus is on the receiver, in this chapter we also gain valuableinsight into the transmitter structure. The problem of choosing a suitable setW ={w0(t), . . . , wm1(t)} of signals will be studied in subsequent chapters. For now, we justrequire that the following two very mild conditions be satisfied: (i)

    |wi(t)|2dt

  • 8/12/2019 PDC_Chap3

    3/30

    3.2. Gaussian Processes and White Gaussian Noise 93

    already what the decoder of Figure3.2 should do with the sufficient statistic produced bythe n -tuple former.

    A point about notation needs to be clarified. As costumary in engineering practice, afunction of time g is denoted by g(t) . Nevertheless, the inner product of g(t) and h(t)

    is denotedg, h . This will remind us that we are viewing g(t) and h(t) as vectors.

    3.2 Gaussian Processes and White Gaussian Noise

    In this section we assume a working knowledge with Gaussian random vectors (reviewedin Appendix2.C) and with the notion of wide-sense stationarity. Recall that a stochas-tic process N(t) is wide-sense stationary (WSS) if the mean E[N(t)], as well as theautocovariance KN(t +, t) := E[N(t +)N

    (t)], do not depend on t .

    A stochastic process is completely characterzed by specifying the joint distribution ofeach finite collection of samples. For a Gaussian random process, the samples are jointlyGaussian:

    Definition 55. N(t) is aGaussian random process if for any finite collection of timest1, t2, . . . , tk , the vector Z= (N(t1), N(t2), . . . , N (tk))

    T of samples is a Gaussian randomvector. A second process N(t) is jointly Gaussian with N(t) if Z and Z are jointlyGaussian random vectors for any vectorZconsisting of samples from N(t) .

    Next we definewhite Gaussian noise. We give two definitions. The first definition is quitecommon in communication textbooks. It has the merit of being straightforward and it

    is a handy tool in certain derivations, notably in deriving the spectrum of filtered whitenoise. Unfortunately, as explained below, this definition is not entirely satisfactory. Forthis reason, we give a second definition that we find preferable. Both lead to the sameresult. An example of a standard derivation done both ways is given at the end of thesection (Example59).

    3.2.1 Dirac-Delta-Based Definition of White Gaussian Noise

    White Gaussian noise can be defined as a zero-mean WSS Gaussian random process N(t)of autocovariance K

    N() = N0

    2(). We now show a calculation using this definition.

    Example56. Let g1(t) and g2(t) be two finite-energy pulses and for i= 1, 2 , define

    Zi=

    N()gi()d,

    where N(t) is white Gaussian noise as we just defined. We compute the covariance

  • 8/12/2019 PDC_Chap3

    4/30

    94 Chapter 3.

    cov

    Zi, Zj

    as follows:

    cov(Zi, Zj) : = E

    ZiZj

    = E N()gi()d N()gj ()d=

    E [N()N()] gi()g

    j ()dd

    =

    N0

    2 ( )gi()gj ()dd

    =N0

    2

    gi()g

    j ()d.

    This definition is easily remembered but is unsatisfactory in a number of ways. First,

    a process that has autocovariance KN() = N02 () cannot be an accurate model for aphysical signal. In fact, recall that the Fourier transform of the autocovariance KN()is the power spectral density ) SN(f) (also called power spectrum. If KN() =

    N02 ()

    then SN(f) = N02

    , i.e., a constant. Integrating over the power spectral density yields thepower, which in this case it is infinite. A related problem shows up from a different anglewhen we try to determine the variance of a sample N(t0) for an arbitrary time t0 . Thisis the autocovariance KN() evaluated at = 0, but a Dirac delta is not defined as astand-alone function.1 Since we think of a Dirac delta as a very narrow and very tallfunction of unit area, we could argue that (0) = . Also this is unsatisfactory becausewe would rather avoid having to define Gaussian random variables of infinite variance.

    More precisely, a stochastic process is characterzed by specifying the joint distribution ofeach finite collection of samples, which implies that we would have to define the densityof any collection of Gaussian random variables ofinfinitevariance.

    3.2.2 Observation-Based Definition of White Gaussian Noise

    The good news is that with a bit more patience we can define white Gaussian noise ina way that is consistent with the physical world and that avoids the use of the Diracdelta function. Instead of defining Gaussian noise as a stand-alone entity, we define what

    happens when we observe Gaussian noise. Let us say we would like to observe N(t) atsome arbitrary time t = t0 . If N(t) is the noise in a wire, the measurement would bemade via a probe. The tip of the probe is a piece of metal, i.e., a linear time-invariantsystem of some (finite-energy) impulse response h(t) . Hence through this probe we canobserve the process

    Z(t) =

    N()h(t )d (3.1)

    1Recall that a Dirac delta function is defined through what happens when we integrate it against afunction, i.e., through the relationship

    (t)g(t) =g(0).

  • 8/12/2019 PDC_Chap3

    5/30

    3.2. Gaussian Processes and White Gaussian Noise 95

    but we cannot observe N(t). The same happens when a receiver tries to observe the re-ceived signal. Specifically, the contribution of the noise N(t) to a measurement performedby the receiver has the form (3.1).

    Recall that a stochastic process is described by the distribution of all the finite collections

    of samples. If we sample Z(t) at time t0 we obtain

    Z(t0) =

    N()h(t0 )d=

    N()g()d

    where we have defined g() =h(t0 ) .We conclude that to describe the object that we call white Gaussian noise it suffices tospecify the statistic of

    Z= (Z1, . . . , Z k)T, where (3.2)

    Zi=

    N()gi()d, (3.3)

    for any collection g1(t), . . . , gk(t) of finite-energy impulse responses. We are now readyto give our working definition of white Gaussian noise.

    Definition 57. N(t) is white Gaussian noise of power spectral density N02 if for anyfinite collection ofL2 functions g1(t), . . . , gk(t) ,

    Zi=

    N()gi()d, i= 1, 2, . . . , k (3.4)

    is a collection of zero-mean jointly Gaussian random variables of covariance

    cov

    Zi, Zj

    = E

    ZiZj

    =

    N02

    gi(t)g

    j (t)dt=

    N02gi, gj. (3.5)

    Due to its importance and frequent use, we formulate the following special case as alemma.

    Lemma58. Let Zbe defined as in (3.2) (3.3) with{g1(t), . . . , gk(t)} being an orthonor-mal set of functions. Then Z

    N(0, N0

    2Ik) .

    Proof. The straightforward proof is left as an exercise.

    The example that follows is a typical calculation. We calculate using both definitions ofwhite Gaussian noise.

    Example 59. Let N(t) be white Gaussian noise at the input of a linear time-invariantcircuit of impulse resonse h(t) . Compute the autocovariance of the output Z(t) =

    N()h(t )d .

  • 8/12/2019 PDC_Chap3

    6/30

    96 Chapter 3.

    Solution: The definition of autocovariance is KZ() := E [Z(t +)Z(t)] . The compu-

    tation using the Dirac-delta-based definition mimics the derivation in Example56. Theresult is KZ() =

    N02

    h(t+)h(t)dt . If we use the observation-based definition we do

    not need to calculate (but we do need to know (3.5) which is part of the definition). Infact the Zi and Zj defined in (3.4) and used in (3.5) become Z(t + ) and Z(t) if we set

    gi() = h(t+ ) and gj() = h(t ) , respectively. Hence we can read the resultdirectly out of3.5, namely

    KZ() =N0

    2

    h(t + )h(t )dt= N0

    2

    h(+ )h()d=

    N02

    Kh(),

    where Kh() :=

    h(t +)h(t)dt is the autocovariance2.

    We conclude with observations that are by no means essential. The reader eager to moveon may skip to the next section with no loss of continuity.

    The Dirac-delta-based definition of white Gaussian noise is easily remembered butsome calculations must be made to obtain the frequently used result worked out inExample56. The measurement-based definition has that result built into it.

    If we define N(t) as a stochastic process, then we need to argue that an integral suchas

    N(t)h(t)dt is well defined. This can be done rigorously but requires measure

    theory. The problem does not exist if we use the observation-based definition, sincethe object

    N(t)h(t)dt is, by definition, a Gaussian random variable.

    Defining an object indirectly through its behavior, as we have done with the measured-based definition of white Gaussian noise, is not new. We do something similarwhen we introduce the Dirac delta function by saying that it fulfills the relation-ship

    f(t)(t) =f(0). In both cases, we introduce the object of interest by saying

    how it behaves when integrated against a generic function.

    The reason the physically-unsustainable Dirac-delta-based model leads to physicallymeaningful results is quite clear. As every wire is ultimately a filter, we are interestedin the effect of white noise after it passes a filter. Real-world filters have a finite-support frequency response. The noise components outside this finite support haveno effect. When we talk about white noise, we think of a WSS stochastic processthat has a flat power spectral density over this finite support. The noise that we are

    trying to model has flat power spectral density over a frequency interval wider thanthe frequency response support of any filter we may consider in practice. To have aphysically-meaningful definition of the process that we call white noise, we should fix

    2Recall that for a deterministic signal h(t) the autocovariance is defined as Kh() :=h(t+)h(t)dt

    whereas for a stochastic process Z(t) it is defined as KZ() := E [Z(t + )Z(t)] . The relationship

    between the two becomes more evident for ergodic processes. In fact, for an ergodic process, the timeaverage over any sufficiently long sample path converges to the ensemble average. Even if we are only

    concerned with ergodic processes, we write KZ() := E [Z(t + )Z(t)] rather than limT

    1

    2T

    TT

    z(t+)z(t)dt because the former is the one we use to determine KZ() .

  • 8/12/2019 PDC_Chap3

    7/30

    3.3. Observables and Sufficient Statistic 97

    the interval over which the process power spectral density is flat. This is a catch-22situation because if we commit to such an interval, then we can immediately exhibitan impulse response for which the noise power spectral density is not wide enough.Hence we let the model go to infinity. The input process no longer exists but it leadsto the correct output process no matter what the filters frequency response is. We

    see that the two definitions of white noise are quite similar in spirit: in both cases,the focus is not on the noise itself but on what we obtain when the noise is filtered.

    3.3 Observables and Sufficient Statistic

    By assumption, our channel output is R(t) =wi(t) +N(t) for some i H , where N(t) iswhite Gaussian noise. As discussed in the previous section, due to the, noise the channeloutput R(t) is not observable. What we can observe via measurements is the integral ofR(t) against any number of finite-energy waveforms.

    Hence we can consider as the observable any k -tuple V = (V1, . . . , V k)T such that

    Vi=

    R()gi ()d, i= 1, 2, . . . , k , (3.6)

    where g1(t), . . . , gk(t) are arbitrary finite-energy waveforms. The complex conjugate op-erator on gi () is superfluous for real-valued signals, but as we will see in Chapter 7,the baseband representation of passband signals is complex-valued.

    Notice that we are assuming that we can perform an arbitrarily large but finite numberk of measurements. Discounting infinite measurements allows us to avoid distracting

    mathematical subtleties without loosing anything of engineering relevance.

    It is important to point out that the kind of measurements we are considering is quitegeneral. For instance, we can pass R(t) through an ideal lowpass filter of cutoff frequencyB for some huge B (say 1010 Hz) and collect an arbitrary large number of samples takenevery 1

    2Bseconds so as to fulfill the sampling theorem (Theorem72). In fact, by choosing

    gi(t) = h( i2B

    t), where h(t) is the impulse response of the lowpass filter, Vi becomesthe filter output sampled at time t = i

    2B. As stated by the sampling theorem, from

    these samples we can reconstruct the filter output. If R(t) is band-limited and B issufficiently large, then the filter output is identical to the filter input. In this case, fromthe measurements we can reconstruct R(t) .

    LetVbe the inner product space spanned by the elements of the signal setW and let{1(t), . . . ,n(t)} be an arbitrary orthonormal basis forV. We claim that the n -tupleY = (Y1, . . . , Y n)

    T with i th component

    Yi=

    R()i ()d

    is a sufficient statistic among any collection of measurements that contains Y . To provethis claim, let V = (V1, . . . , V k)

    T be the collection of additional measurements made

  • 8/12/2019 PDC_Chap3

    8/30

    98 Chapter 3.

    according to (3.6). LetUbe the inner product space spanned by V {g1(t), . . . , gk(t)}and let {1(t), . . . ,n(t),1(t), . . . ,n(t)} be an orthonormal basis forU obtained byextending the orthonormal basis{1(t), . . . ,n(t)} forV. Define

    Ui= R()i ()d, i= 1, . . . , n.It should be clear that we can recover (Y, U) from (Y, V) . In words, this is so becausefrom the projections onto a basis, we can obtain the projection onto any waveform in thespan of the basis. Mathematically,

    Vi =

    R()gi ()d

    =

    R()

    nj=1

    i,jj() +n

    j=1

    i,j+nj()

    d

    =n

    j=1

    i,jYj+n

    j=1

    i,j+nUj,

    where i,1, . . . , i,n+n is the unique set of coefficients in the orthonormal expansion ofgi(t)with respect to the base{1(t), . . . ,n(t),1(t),2(t), . . . ,n(t)} .Hence we can consider (Y, U) asthe observableand it suffices to show thatY is a sufficientstatistic. Note that when H=i ,

    Yj =

    R()j () =

    wi() + N()

    j ()d= ci,j+ Z|V,j,

    where ci,j is the j th component of the n -tuple of coefficients ci that represents thewaveform wi(t) with respect to the chosen orthonormal basis, and Z|V,j is a zero-meanGaussian random variable of variance N0

    2 . The notation Z|V,j is meant to remind us

    that this random variable is obtained by projecting the noise onto the j th element of thechosen orthonormal basis forV. Using n -tuple notation, we obtain the following statistic

    H=i, Y =ci+ Z|V

    where Z|V N(0, N02 In). Similarly,

    Uj = R()j() = wi() + N()j()d= N()j()d= ZV,j,where we used the fact that wi(t) is in the subspace spanned by{1(t), . . . ,n(t)} andtherefore it is orthogonal to j(t) for each j = 1, 2, . . . , n . Using n -tuple notation, weobtain

    H=i, U=ZV,

    where ZV N(0, N02 In). Furthermore, Z|V and ZVare independent of each other andof H. The conditional density of Y, U given H is

    fY,U|H(y, u|i) =fY|H(y|i)fU(u).

  • 8/12/2019 PDC_Chap3

    9/30

    3.3. Observables and Sufficient Statistic 99

    Z|V

    V

    ZV

    Y

    U

    Y

    ci

    0

    Figure 3.3: The vector of measurements (YT, UT)T describes the projectionof the received signal R(t) ontoU. The vector Ydescribe the projection of

    R(t) ontoV.

  • 8/12/2019 PDC_Chap3

    10/30

    100 Chapter 3.

    From the Fisher-Neyman Factorization Theorem (Theorem17, Chapter2), we know thatY is a sufficient statistic and Uis irrelevant as claimed.

    Figure3.3 depicts what is going on. The vector Y that describes the projection of thereceived R(t) onto the signal spaceV consists of ci plus a noise vector that also lies inV. The vector Uthat contains the remaining measurements is not affected by the signalwi(t) or, equivalently, by ci . Hence the channel output R(t) consists of wi(t) (whichlives inV) plus a component of the noise that also lives inVplus a component of thenoise which is orthogonal toV. The measurements that produce Y remove the thirdcomponent by projectingR(t) intoV. More precisely, Y is the vector of coefficients thatdescribe that projection with respect to the chosen basis. Mathematically,

    Y =ci+ Z|V describes the projection ofR(t) onto the signal spaceV.U=ZV describes the noise which lives inUbut is orthogonal toV.

    We have proved that Y is a sufficient statistic. Could we prove that a subset of thecomponents of Y is not a sufficient statistic? Yes, we could. Here is the outline of aproof. Without loss of essential generality, let us think of Y as consisting of two parts,Ya and Yb . Similarly, we decompose every ci into the corresponding parts cia and cib .The claim is that H followed by Ya followed by (Ya, Yb) does not form a Markov chainin that order. In fact when H = i , Yb consists of cib plus noise. Since cib cannot bededuced from Ya in general (or else we would not bother sending cib ), if follows that thestatistic of Yb depends on i even if we know the realization of Ya .

    3.4 Transmitter and Receiver Architecture

    From the discussion of the previous section, we know that a MAP receiver for the waveformAWGN channel can be structured as shown in Figure 3.4. We see that the receiver front-end computes Y from R(t) in a block that we call n -tuple former. (The name is notstandard.) The hypothesis testing problem based on the observable Y is

    H=i : Y =ci+ Z.

    where Z N(0, N02

    In) is independent of H. This is precisely the hypothesis testingproblem studied in Chapter2 in conjunction with an encoder that used the codeword cito signal message i across the discrete-time AWGN channel. To leverage on what welearned in Chapter 2, we also decompose the transmitter into a module that producesci , called encoder as in Chapter 2, and one that produces wi(t) . We call the latter thewaveform former. (Once again, the terminology is not standard.) Henceforth the n -tupleof coefficients ci will be referred to as the codewordassociated to wi(t) .

    Everything that we learned about a decoder for the discrete-time AWGN channel is ap-plicable to the decoder of the continuous-time AWGN channel. Incidentally, the decom-position of Figure3.4is consistent with the layering philosophy of the OSI model (Section

  • 8/12/2019 PDC_Chap3

    11/30

    3.4. Transmitter and Receiver Architecture 101

    iDecoder

    Yn

    Y1

    .

    ..Integrator

    Integrator

    n(t)

    1(t)

    R(t)f

    n-Tuple Former

    N(t)

    AWGN

    i H Encoder

    ci,1

    ci,n

    ...

    n(t)

    1(t)

    wi(t) = j

    ci,jj(t)

    Waveform Former

    Figure 3.4: Decomposed transmitter and receiver.

    1.1), in the sense that the encoder and decoder are designed as if they were talking to one

    another directly via a discrete-time AWGN channel. In reality, the channel seen by theencoder/decoder pair is the result of the service provided by the waveform former andthe n -tuple former.

    The above decomposition is useful for the system conception, for the performance analy-sis, as well as for the system implementation; but of course, we always have the option ofimplementing the transmitter as a straight map from the message setH to the waveformsetWwithout passing through the codebookC . Although such a straight map is a possi-bility and makes sense for relatively unsophisticated systems, the decomposition into andencoder and a waveform former is standard for modern designs. In fact, information the-ory, as well as coding theory, devote much attention to the study of encoder/decoder pairs.

    (Roughly speaking, information theory tries to discern the possible from the impossible,whereas coding theory seeks implementable solutions.)

    The following example is meant to make two points that apply when we communi-cate across the AWGN channel and make an ML decision: First, two constellations ofcontinuous-time signals may look very different yet they may share the same codebook,which is sufficient to guarantee that the error probability be the same; second, for binaryconstellations, what matters for the error probability is the distance between the twosignals and nothing else.

  • 8/12/2019 PDC_Chap3

    12/30

    102 Chapter 3.

    Example 60. The following four choices ofW={w0(t), w1(t)} look very different yet,upon an appropriate choice of orthonormal basis, they share the same codebookC ={c0, c1} with c0 = (

    E, 0)T and c1 = (0,E)T . To see this, it suffices to verify that in

    all caseswi, wj=Eij , where ij equals 1 if i= j and 0 otherwise.

    Choice 1 (Rectangular Pulse Position Modulation):

    w0(t) =

    ET

    [0,T](t)

    w1(t) =

    ET

    [T,2T](t),

    where we have used the indicator function I(t) to denote a rectangular pulse which is1 in the intervalI and 0 elsewhere. Rectangular pulses can easily be generated, e.g.by a switch. They are used, for instance, to communicate a binary symbol within anelectrical circuit. As we will see, in the frequency domain these pulses have side lobesthat decay relatively slow, which is not desirable for high data rate over a channel forwhich bandwidth is at premium.

    Choice 2 (Frequency Shift Keying):

    w0(t) =

    2ET

    sin

    k

    t

    T

    [0,T](t)

    w1(t) =

    2ET

    sin

    l

    t

    T

    [0,T](t),

    where k and l are positive integers, k=l . With a large value of k and l , these signalscould be used for wireless communication. To see that the two signals are orthogonal toone another we can use the trigonometric identity sin() sin() = 0.5[cos()cos(+)] .

    Choice 3 (Sinc Pulse Position Modulation):

    w0(t) =

    ET

    sinc

    t

    T

    w1(t) = ET

    sinct TT

    .An advantage of sinc pulses is that they have a finite support in the frequency domain.By taking their Fourier transform, we quickly see that they are orthogonal to one another.See Appendix5.Bfor details.

  • 8/12/2019 PDC_Chap3

    13/30

    3.4. Transmitter and Receiver Architecture 103

    Choice 4 (Spread Spectrum):

    w0(t) =

    ET

    nj=1

    c0,j [0,Tn]

    t j T

    n

    w1(t) =E

    T

    nj=1

    c1,j [0,Tn]

    t j T

    n

    where c0 = (c0,1, . . . , c0,n)T {1}n and c1 = (c1,1, . . . , c1,n)T {1}n are orthogonal.

    This signaling method is called spread spectrum. It uses much bandwidth but it has aninherent robustness with respect to interfering (non-white) signals.

    Now assume that we use one of the above choices to communicate across a continuous-time AWGN and that the receiver implements an ML decision rule. Since the codebookCis the same in all cases, the decoder and the error probability will be identical no matterwhich choice we make.

    Computing the error probability is particularly easy when there are only two codewords.From the previous chapter we know that Pe=Q

    c1c0

    2

    , where 2 = N0

    2 . The distance

    c1 c0:= n

    i=1

    (c1,i c0,i)2 =

    2E

    can also be computed as

    w1 w0:= [w1(t) w0(t)]2dt,which requires neither an orthonormal basis nor the codebook. Yet another alternativeis to use Pythagoras theorem. As we know already that our signals have squared normEand are orthogonal to one another, their distance is w02 + w12 = 2E. Puttingthings together,

    Pe=Q

    EN0

    .

    3.4.1 Alternative Receiver Structures

    It is interesting to explore a refinement and a variation of the receiver structure shown inFigure3.4. We start by restating the setup.

    The source produces H =i with probability PH(i) , i H . When H =i , the channeloutput is R(t) =wi(t) + N(t) where wi(t) W,W={w0(t), w1(t), . . . , wm1(t)} is thesignal constellation assumed to be known to the receiver and N(t) is white Gaussian noise.

  • 8/12/2019 PDC_Chap3

    14/30

    104 Chapter 3.

    We find an arbitrary orthonormal basis{1(t), . . . ,n(t)} for the vector spaceV spannedbyW. This can always be done using the Gram-Schmidt procedure, but sometimes wecan choose by hand a more convenient orthonormal basis. At the receiver, we obtain asufficient statistic by projecting the received signal R(t) onto each of the basis vectors.3

    The result is

    Y = (Y1, Y2, . . . , Y n)T where

    Yi=R,i, i= 1, . . . , n .

    We now face a hypothesis testing problem with prior pH(i) , i H , and observable Ydistributed according to

    fY|H(y|i) = 1(22)

    n2

    exp

    y ci

    2

    22

    ,

    where 2 = N02 . A MAP receiver that observes Y = y decides H = i for one of thei H that maximize PH(i)fY|H(y|i) or any monotonic function thereof. Since fY|H(y|i)is an exponential function of y , we simplify the test by taking the logarithm. We alsoremove terms that do not depend on i and rescale, keeping in mind that if we scale witha negative number, we have to change the maximization into minimization. We chooseN0 as the scaling factor and obtain the first of the following equivalent MAP decisionrules, where qj =qi=

    12

    (N0ln PH(i) ci2) .Equivalent MAP decision rules:

    (i) Choose Has one of the i that minimizes

    N0ln PH(j) +

    y

    cj

    2 .

    (ii) Choose Has one of the i that maximizes{y, cj} + qj .

    (iii) Choose Has one of the i that maximizes

    r(t)wj (t)dt

    + qj .

    To see that (i) and (ii) are equivalent, we use yci2 =yci, yci =y2 2{y, ci}+ci2 , which applies to complex-values n -tuples. Rules (ii) and (iii) areequivalent since

    r(t)wi (t)dt=

    r(t)

    jc

    i,j

    j (t)

    dt=

    jyjc

    i,j =y, ci .

    Each of the above equivalent MAP rule requires performing operations of the kind r(t)b(t)dt, (3.7)

    where b(t) is some fixed function (j(t) or wj(t) ). There are two ways to implement(3.7). The obvious way, shown in Figure 3.5 (a) is by means of a so-called correlator.

    3We make a slight abuse of notation and terminology here. What we mean is an operation of the kindR(t)

    i(t)dt . Because R(t) is not an element of an inner product space that we have defined, strictly

    speaking we are not allowed to denote this operation R, and to talk about projection. We indulgethis abuse because the notation is convenient and there is no danger of confusion.

  • 8/12/2019 PDC_Chap3

    15/30

    3.4. Transmitter and Receiver Architecture 105

    (a)

    r(t)

    b(t)

    Integrator

    r(t)b(t)dt

    (b)

    r(t) b(T t)t= T

    r(t)b(t)dt

    Figure 3.5: Two ways to implement

    r(t)b(t)dt , namely via a correlator (a)and via a matched filter (b).

    Think of a correlator as a device that multiplies and integrates two input signals. Theother way to implement

    r(t)b(t)dt is via a so-called matched filter. This is a filter that

    takes r(t) as the input and has h(t) = b(T t) as impulse response (Figure3.5(b)),where Tis an arbitrary design parameter selected in such a way as to make h(t) a causalimpulse response. The output signal y(t) and its value at time T are, respectively,

    y(t) =

    r()h(t )d

    =

    r()b(T+ t)d

    y(T) =

    r()b()d.

    We see that the latter is indeed (3.7).

    Each of the above three MAP decision rules has a block diagram representation in Fig-ure3.6. In each cases the front-end has been implemented by using matched filters, butcorrelators could also be used, as in Figure 3.4.

    Whether we use matched filters or correlators depends on the technology and on thewaveforms. Implementing a correlator in analog technology is costly. But, if the processingis done by a microprocessor that has enough computational power, then a correlation canbe done at no additional hardware cost. We would be inclined to use matched filters ifthere were easy-to-implement filters of the desired impulse response. In Problem5of thischapter, we give an example where the matched filters can be implemented with passivecomponents.

    The top diagram in Figure3.6corresponds to the receiver in Figure3.4except for matchedfilters instead of correlators. The second (from the top) diagram is more specific than thefirst in terms of the operation that the decoder needs to do. The third requires neitheran orthonormal basis nor knowledge of the codebook, but it does require m as opposed

  • 8/12/2019 PDC_Chap3

    16/30

    106 Chapter 3.

    to n correlators or matched filters. We know that mn , and often m is much largerthan n . Notice also that the block diagram at the bottom of Figure3.6does not quitefit into the decomposition of Figure 3.2. In fact the receiver bypasses the need for then -tuple Y. As a byproduct, this proves that the receiver performance is not affected bythe choice of an orthonormal basis.

  • 8/12/2019 PDC_Chap3

    17/30

    3.4. Transmitter and Receiver Architecture 107

    R(t)

    wm1(T t)

    w0(T t)

    t= T

    t= T

    R(t)w0(t)dt

    R(t)wm1(t)dt

    q0

    qm1

    SelectLargest

    H

    Alternative to the n -Tuple Former

    R(t)

    n(T t)

    1(T t)

    t= T

    t= T

    Y1

    Yn

    WeighingMatrix

    Y, c0Y, cm1

    q0

    qm1

    SelectLargest

    H

    n -Tuple Former Decoder Implementation

    R(t)

    n(T t)

    1(T

    t)

    t= T

    t= T

    Y1

    Yn

    MAP Decoder

    H

    n -Tuple Former

    Figure 3.6: Three block diagrams of an optimal receiver for the waveformAWGN channel. Each receiver front-end can alternatively be implemented via

    correlators.

  • 8/12/2019 PDC_Chap3

    18/30

  • 8/12/2019 PDC_Chap3

    19/30

    3.5. Continuous-Time Channels Revisited 109

    Propagation Delay and Clock Misalignment: Propagation delay is the time it takes asignal to reach a receiver. If the signal set is W = {w0(t), w1(t), . . . , wm1(t)} andthe propagation delay is , then for the receiver it is as if the signal set were W ={w0(t), w1(t), . . . , wm1(t)} and there were no propagation delay. The commonassumption is that the receiver does not know when the communication starts. For

    instance, in wireless communication, a receiver has no way to know that the propagationdelay has changed, because the transmitter has moved while it was turned off. It is theresponsibility of the receiver to determine from the received signal. We come to thesame conclusion when we consider the fact that the clocks of different devices are often notsynchronized. If the clock of the receiver reads t when that of the transmitter reads tthen, once again, for the receiver, the signal set is W for some unknown to be estimatedwhen the communication starts. Estimating goes under the general name of clocksynchronization. For reasons that will become clear, the clock synchronization problemdecomposes into the symbol synchronization and the phase synchronization problems,discussed in Sections5.5and7.6. Until then and unless otherwise specified, we assume

    that there is no propagation delay and that all clocks are synchronized.

    Filtering: In wireless communication, due to reflections and diffractions on obstacles,the electromagnetic signal emitted by the transmitter reaches the receiver via multiplepaths. Each path has its own delay and attenuation. If wi(t) is transmitted, the receiverantenna output has the form R(t) =

    Ll=1 wl(t l)hl plus noise, where l and hl are

    the delay and the attenuation along the l -th path. Unlike a mirror, the rough surface ofcertain objects creates a large number of small reflections that are best accounted for bythe integral form R(t) =

    wi(t )h()d plus noise. This is the same as saying that

    the channel contains a filter of impulse response h(t) . For a different reason, the samechannel model applies to wireline communication. In fact, due to dispersion, the channel

    output to a unit-energy pulse applied to the input at time t= 0 is some impulse responseh(t) . Owing this to the channel linearity, the output due to wi(t) at the input is, onceagain, R(t) =

    wi(t )h()d plus noise.

    The possibilities we have to cope with the channel filtering depend on whether the channelimpulse response is known to the receiver alone, to both the transmitter and the receiver,or to neither. (It is hard to imagine a situation where only the transmitter knows thechannel impulse response.)

    If the transmitter uses the signal set W ={w0(t), w1(t), . . . , wm1(t)} and the receiverknows h(t), from the receivers point of view, the signal set is

    W with the i th signal

    being wi(t) = (wih)(t) and the channel just adds white Gaussian noise. This is thefamiliar case. Realistically, the receiver knows at best an estimate h(t) of h(t) and usesit as the actual channel impulse response.

    The most challenging situation occurs when the receiver does not know and cannot esti-mate h(t) . This is a realistic assumption in bursty communication, when a burst is tooshort for the receiver to estimate h(t) and the impulse response changes from one burstto the next.

  • 8/12/2019 PDC_Chap3

    20/30

    110 Chapter 3.

    The most favorable situation occurs when both the receiver and the transmitter knowh(t) or an estimate thereof. Typically it is the receiver that estimates and communicatesthe channel impulse response. This requires two-way communication, which is often thecase. In this case, the transmitter can adapt the signal constellation to the channelcharacteristic. Arguably, the best strategy is the so-called water-filling (see e.g. [2]) that

    can be implemented via orthogonal frequency division multiplexing (OFDM).

    We have assumed that the channel impulse response characterizes the channel filteringfor the duration of the transmission. If the transmitter and/or the receiver move, which isoften the case in mobile communication, then the channel is still linear but time-varying.Excellent graduate-level textbooks that discuss this kind of channel are [6] and [7].

    Colored Gaussian Noise: We can think of colored noise as filtered white noise. It is safeto assume that, over the frequency range of interest, i.e., the frequency range occupiedby the information-carrying signals, there is no positive-length interval over which thereis no noise. (A frequency interval with no noise is physically unjustifiable and if we

    insist on such a channel model, we no longer have an interesting communication problembecause we can transmit infinitely many bits error-free by signaling where there is nonoise.). This implies that the frequency response of the noise-shaping filter cannot vanishover a positive-length interval in the frequency range of interest. We can modify theaforementioned noise-reduction filter in such a way that in the frequency range of interestit has the inverse frequency response of the noise-shaping filter. The noise at the outputof the modified noise-reduction filter, called whitening filter, is zero-mean, Gaussian, andwhite. The minimum error probability with the whitening filter cannot be higher thanwithout, because the filter is invertible in the frequency range of interest. What we gainwith the noise-whitening filter is that we are back to the familiar situation where the noise

    is white and the signal set is W ={w0(t), w1(t), . . . , wm1(t)} , with wi(t) = (wi h)(t)and h(t) known to the receiver. See Exercise ??(to be added)for details.

    3.6 Summary and Outlook

    In this chapter we have addressed the problem of communicating a message across awaveform AWGN channel. The importance of the continuous-time AWGN channel modelcomes from the fact that every conductor is a linear time-invariant system that smoothsout and adds up the voltages created by the electrons Brownian motion. Due to the

    central limit theorem, the result of adding up many contributions can be modeled aswhite Gaussian noise. No conductor can escape this phenomenon, unless it is cooledat zero degrees Kelvin. This does not imply that the continuous time AWGN channelis the only channel model of interest. Depending on the situation, there can be otherimpairments such as fading, nonlinearities, and interference, that should be consideredinto the channel model, but they are outside the scope of this text.

    As in the previous chapter, we have focused primarily on the receiver that minimizes theerror probability assuming that the signal set is given to us. We were able to move forwards

  • 8/12/2019 PDC_Chap3

    21/30

    3.6. Summary and Outlook 111

    swiftly by identifying a sufficient statistic that reduces the receiver design problem to theone studied in Chapter2. The receiver consists of an n -tuple former and a decoder. Wehave seen that the sender can also be decomposed into an encoder and a waveform former.This decomposition naturally fits the layering philosophy discussed in the introductorychapter: The waveform former at the sender and the n -tuple former at the receiver can be

    seen as providing a service to the encoder-decoder pair. The service consists in makingthe continuous-time AWGN channel look like a discrete-time AWGN channel.

    How do we proceed from here? First, we need to discuss what is important in termsof performance measures, how they relate to one another, and what options we have tocontrol them. We start this discussion in the next chapter where we also develop someintuition about the kind of signals we want to use. Second, we need to start payingattention to cost and complexity because they can quickly get out of hand. In general thereceiver is far more challenging than the transmitter. In the n -tuple former, we need tobe concerned about the number of correlators or matched filters. The decoder complexity

    is aff

    ected by the search to find the codeword that maximizes{y, cj}+qj . In a moderndesign the number k of bits that we encode into a codeword and the codeword length nmay vary over a wide range. To give an idea, k= 104 and n= 2k are not unusual. Fora brute-force implementation the n -tuple former requires n = 2 104 correlators ormatched filters and the decoder needs to compute {y, cj} +qj for m = 2k 103000codewords. Let us make an on the back of the envelope calculation to get an idea ofwhat this means. If we have a computer with a clock of 1 THz and a hardware acceleratorthat allows us to compute{y, cj} + qj in one clock, then each second we evaluate 1012such expressions, and it will take roughly roughly 3 102980 years to do this for all mcodewords. The big bang happened an estimated 13.75 109 years ago. We see that wecould have a hardware cost and computational complexity problem. In Chapter5 we will

    learn how to choose the waveform former in such a way that the n -tuple former can beimplemented with a single matched filter. In Chapter6we will see that there are encodersfor which the decoder needs to explore a number of possibilities that grows linearly in krather than exponentially.

  • 8/12/2019 PDC_Chap3

    22/30

    112 Chapter 3.

    Appendix 3.A Exercises

    Problem 1. (Gram-Schmidt Procedure On Tuples) By means of the Gram-Schmidtorthonormalization procedure, find an orthonormal basis for the subspace spanned bythe four vectors 1 = (1, 0, 1, 1)

    T , 2 = (2, 1, 0, 1)T , 3 = (1, 0, 1, 2)T , and 4 =

    (2, 0, 2, 1)T .

    Problem 2. (Gram-Schmidt Procedure on Two Waveforms) Use the Gram Schmidtprocedure to find an orthonormal basis for the vector space spanned by the functionsshown below.

    t

    w0(t)

    1

    T t

    w1(t)

    2

    T2

    Problem 3. (Gram-Schmidt Procedure on Three Waveforms)

    t

    w0(t)

    1

    2

    1 t

    w1(t)

    1

    2

    1 2 t

    w2(t)

    1

    2

    1 2 3

    (a) By means of the Gram-Schmidt procedure, find an orthonormal basis for the spacespanned by the waveforms in the figure.

    (b) In your chosen orthonormal basis, let w0(t) and w1(t) be represented by the code-words c0 = (3, 1, 1)T and c1 = (1, 2, 3)T . Plot w0(t) and w1(t) .

    (c) Compute the (standard) inner productsc0, c1 andw0, w1 and compare them.(d) Compute the normsc0 andw0 and compare them.

  • 8/12/2019 PDC_Chap3

    23/30

    3.A. Exercises 113

    Problem 4. (Noise in Regions) Let N(t) be white Gaussian noise of power spectraldensity N0

    2 . Let g1(t) , g2(t) , and g3(t) be waveforms as shown in Figure 3.8. For

    i= 1, 2, 3 , let Zi=

    N(t)gi (t)dt , Z= (Z1, Z2)T , and U= (Z1, Z3)

    T .

    t

    g1(t)

    1

    1T

    t

    g2(t)

    2T1

    1t

    g3(t)

    1

    1

    T

    Figure 3.7:

    (a) Determine the norm

    gi

    , i= 1, 2, 3 .

    (b) Are Z1 and Z2 independent? Justify your answer.

    (c) Find the probability Pa that Z lies in the square of Figure3.8(a).

    (d) Find the probability Pb that Zlies in the square of Figure3.8(b).

    (e) Find the probability Qa that U lies in the square of Figure3.8(a).

    (f) Find the probability Qc that U lies in the square of Figure3.8(c).

    Z1

    Z2 or Z3

    1 2

    1

    2

    (a)

    Z1

    Z2

    (b)

    (0, 2)

    (0, 22)

    Z1

    Z2 or Z3

    1 2

    21

    (c)

    Figure 3.8:

    Problem 5. (Matched Filter Implementation) In this problem, we consider the imple-mentation of matched filter receivers. In particular, we consider Frequency Shift Keying(FSK) with the following signals:

    wj(t) =

    2T

    cos2njT

    t, for0tT ,0, otherwise,

    (3.8)

  • 8/12/2019 PDC_Chap3

    24/30

    114 Chapter 3.

    where nj Z and 0 j m1 . Thus, the communication scheme consists of msignals wj(t) of different frequencies

    njT

    .

    (a) Determine the impulse response hj(t) of the matched filter for the signalwj(t) . Plothj

    (t) .

    (b) Sketch the matched filter receiver. How many matched filters are needed?

    (c) ForT t 3T , sketch the output of the matched filter with impulse responsehj(t) when the input is wj(t) .

    (d) Consider the following ideal resonance circuit:

    CL

    i(t)

    u(t)

    For this circuit, the voltage response to the input current i(t) =(t) is

    h(t) = 1

    Ccos

    tLC

    . (3.9)

    Show how this can be used to implement the matched filter for signal wj(t) . Deter-mine how L and C should be chosen. (Hint: Suppose that i(t) = wj(t) . In this

    case, what is u(t) ?)

    Problem6. (On-OffSignaling)Consider the binary hypothesis testing problem specifiedby:

    H= 0 : R(t) =w(t) + N(t)

    H= 1 : R(t) =N(t)

    where N(t) is additive white Gaussian noise of power spectral density N0/2 and w(t) is

    the signal shown in the left figure.

    (a) Describe the maximum likelihood receiver for the received signal R(t) , t R .(b) Determine the error probability for the receiver you described in (a).

    (c) Sketch a block diagram of your receiver of part (a) using a filter with impulse responseh(t) shown in the right figure.

  • 8/12/2019 PDC_Chap3

    25/30

    3.A. Exercises 115

    t

    w(t)

    1

    1 T3T

    t

    h(t)

    1

    2T

    Figure 3.9:

    Problem 7. (Matched Filter Intuition) In this problem, we develop further intuitionabout matched filters. You may assume that all waveforms are real-valued. Let R(t) =

    w(t) +N(t) be the channel output, where N(t) is additive white Gaussian noise of

    power spectral density N0/2 and w(t) is an arbitrary but fixed pulse. Let (t) be aunit-norm but otherwise arbitrary pulse, and consider the receiver operation

    Y = R, h=w, + N, . (3.10)Y is a sufficient statistic for determining whether w(t) orw(t) was transmitted. Thesignal-to-noise ratio (SNR) is defined as

    SNR= |w, |2E[|N, |2] .

    Notice that the SNR remains the same if we scale (t) by a constant factor. Hence weassume that (t) has unit norm which implies

    E|N, |2 = N0

    2 . (3.11)

    (a) Use Cauchy-Schwarz inequality to give an upper bound on the SNR. What is thecondition for equality in the Cauchy-Schwarz inequality? Find the (t) that maxi-mizes the SNR. What is the relationship between the maximizing(t) and the signalw(t) ?

    (b) Let us verify that we would get the same result using a pedestrian approach. Insteadof waveforms we consider tuples. So let c= (c1, c2)T R2 and use calculus (insteadof the Cauchy-Schwarz inequality) to find the = (1,2)

    T R2 that maximizesc, subject to the constraint that has unit norm.

    (c) Verify with a picture (convolution) that the output at time Tof a filter with input

    w(t) and impulse response h(t) =w(T t) is indeed T0

    w2(t)dt .

    (d) We can also look at the situation in terms of Fourier transforms. Write out thefrequency response of the matched filter.

  • 8/12/2019 PDC_Chap3

    26/30

    116 Chapter 3.

    Problem 8. (AWGN Channel and Sufficient Statistic) LetW ={w0(t), w1(t)} be thesignal constellation used to communicate an equiprobable bit across an additive Gaussiannoise channel. In this problem, we verify that the projection of the channel output ontothe inner-product spaceV spanned byW is not necessarily a sufficient statistic, unlessthe noise is white. Let 1(t),2(t) be an orthonormal basis for

    V. We choose the additive

    noise to beZ(t) =N11(t) +N22(t) +N33(t) for some normalized3(t) that is orthog-onal to 1(t) and 2(t) and choose N0 , N1 and N2 to be zero-mean jointly Gaussianrandom variables of identical variance 2 . Let ci= (ci,1, ci,2, 0)

    T be the codeword associ-ated to wi(t) with respect to the extended orthonormal basis 1(t),2(t),3(t) . Thereis a one-to-one correspondence between the channel output R(t) and Y = (Y1, Y2, Y3)

    T

    where Yi=R,i . In terms of Y, the hypothesis testing problem is

    H=i : Y =ci+ N i= 0, 1

    where we have defined N= (N1, N2, N3)T .

    (a) As a warm-up exercise, let us first assume that N1 , N2 and N3 are independent.Use the Neyman-Fisher factorization theorem (Problem 21 of Chapter 2) to showthat Y1, Y2 is a sufficient statistic.

    (b) Now assume that N1 and N2 are independent but N3 = N2 . Prove that in this caseY1, Y2 isnot a sufficient statistic.

    (c) To check a specific case, consider c0 = (1, 0, 0)T and c1 = (0, 1, 0)

    T . Determine theerror probability of an ML receiver when it observes (Y1, Y2)

    T and when it observes(Y1, Y2, Y3)

    T .

    Problem 9. (Mismatched Receiver) Let the channel output be

    R(t) = c X w(t) + N(t), (3.12)

    wherec >0 is some deterministic constant, Xis a uniformly distributed random variablethat takes values in{3, 1, 1, 3} , w(t) is the deterministic waveform

    w(t) = 1, if0t

  • 8/12/2019 PDC_Chap3

    27/30

    3.A. Exercises 117

    (c) Suppose now that you still use the receiver you have described in Part (a), but thatthe received signal is actually

    R(t) = 3

    4c X w(t) + N(t), (3.14)

    i.e., you were unaware that the channel was attenuating the signal. What is theprobability of error now?

    (d) Suppose now that you still use the receiver you have found in Part(a) and that R(t)is according to Equation (3.12), but that the noise is colored. In fact, N(t) is azero-mean stationary Gaussian noise process of auto-covariance function

    KN() = 1

    4e||/,

    where 0