PDC_Chap6

8/12/2019 PDC_Chap6

1/30

Chapter 6

Convolutional Coding and Viterbi

Decoding

6.1 Introduction

In this chapter we shift focus to the encoder/decoder pair. The general setup is that ofFigure 6.1, where N(t) is white Gaussian noise of power spectral density N0/2. Thedetails of the waveform former and the n -tuple former are immaterial for this chapter.The important fact is that the channel model from the encoder output to the decoderinput is the discrete-time AWGN channel of noise variance 2 =N0/2 .

The study of coding/decoding methods has been an active research area since the secondhalf of the twentieth century. It is called coding theory. There are many coding techniques,and a general introduction on coding can easily occupy a one-semester graduate-level

course. Here we will just consider an example of a technique called convolutional coding.By considering a specific example, we can considerably simplify the notation. As seenin the exercises, applying the techniques learned in this chapter to other convolutionalencoders is fairly straightforward. We choose convolutional coding for two reasons: (i)it is well suited in conjunction with the discrete-time AWGN channel; (ii) it allows usto introduce various instructive and useful tools, notably the Viterbi algorithm to domaximum likelihood decoding and the technique to upper bound its bit error probability.

195

8/12/2019 PDC_Chap6

2/30

196 Chapter 6.

-

N(t)

6

WaveformFormer

?

BasebandFront-End

R(t)

6

6

Encoder

?

Decoder

6

c1, . . . , cn y1, . . . , yn

d1, . . . , dk d1, . . . ,dk

cj-

-yj

6

Z N(0, N02 )

Figure 6.1: System view for the current chapter.

6.2 The Encoder

The encoder is the device that takes the message and produces the discrete-time channelinput. In this chapter the message consists of a sequence b1, b2, . . . , bk of binary sourcesymbols. To simplify the descriptoin of the encoder, we let the source symbols take valuein {1} .

For comparison with bit-by-bit on a pulse train, we let the channel symbols take valuein {

Es} . In describing the encoder output we do not want to carrying around the

factorEs . Hence we declare the encoder output to be the sequence x1, x2, . . . , xn where

xj = ci

Es {1} , j = 1, . . . , n .

The source symbols enter the encoder sequentially, at regular intervals determined by theencoder clock. During the j th epoch, j = 1, 2, . . . , the encoder takes bj and producestwo output symbols, x2j1 and x2j, according to the encoding map

x2j1= bjbj2x2j =bjbj1bj2.

8/12/2019 PDC_Chap6

3/30

6.2. The Encoder 197

To produce x1 and x2 the encoder needs b1 and b0 , which are assumed to be 1 bydefault.

The circuit that implements the convolutional encoder is depicted in Figure 6.2, where denotes multiplication in R or, equivalently, addition in the binary field with ele-ments {1} (see Problem5). A shift register stores the past two inputs. As implied bythe indices, the outputs are serialized.

bj1 bj2bj

x2j =bjbj1bj2

x2j1= bjbj2

Figure 6.2: Convolutional encoder.

Notice that the encoder output has length n = 2k . The following is an example ofa source sequence of length k = 5 and the corresponding encoder output sequence oflength n= 10 .

bj 1 1 1 1 1x2j1, x2j 1, 1 1, 1 1, 1 1, 1 1, 1

j 1 2 3 4 5

Because the n= 2k encoder output symbols are determined by the k input bits, only 2k

of the 2n seqeunces of{Es}n are codewords. Hence we use only a fraction 2k/2n = 2k

of all possible n -length channel input sequences. From a high level point of view, we giveup a factor two in the bit rate to make the signal space much less crowded, hoping that

this will significantly reduce the probability of error.

We have already seen two ways to describe the encoder (theencoding mapand theencodingcircuit). A third way, useful in determining the error probability, is the state diagramofFigure6.3. The diagram describes a finite state machine. The state of the convolutionalencoder is what the encoder needs to know about past inputs so that the state and thecurrent input determine the current output. For the convolutional encoder of Figure6.2the state at time j can be defined to be (bj1, bj2). Hence we have 4 states.

As the diagram shows, there are two possible transitions from each state. The inputsymbol bj decides which of the two transitions is taken during epoch j . Transitions arelabeled by bj|x2j1, x2j. To be consistent with the default b1= b0= 1, the state is (1, 1)

when b1 enters the encoder.

8/12/2019 PDC_Chap6

4/30

198 Chapter 6.

b

l r

t

1| 1, 1

1|1, 1

1|1, 1

1|1, 1

1|1, 11|

1, 1

1| 1, 1

1|

1, 1

State Labels

t= (1, 1)l= (1, 1)r= (1, 1)b= (1, 1)

Figure 6.3: State diagram description of the convolutional encoder.

The choice of letting the encoder input and output symbols be the elements of {1}is not standard. Most authors chose the input/output alphabet to be {0, 1} and useaddition modulo 2 instead of multiplication. Our choice is better suited for the AWGNchannel, whereas the conventional viewpoint has the advantage of making it evident thatthe code is linear. As it is linear and time-invariant, the output is the result of convolvingthe input with an imulse response (even though the coding literature does not talk aboutimulse response). In Exercise6we establish the link between the two viewpoints and inExercise5we prove from the first principles that the encoder is indeed linear.

In each epoch, the convolutional encoder we have chosen has k0= 1 symbol entering andn0 = 2 symbols exiting the encoder. In general, a convolutional encoder is specified by(i) the number k0 of source symbols entering the encoder in each epoch; (ii) the numbern0 of symbols produced by the encoder in each epoch, where n0> k0 ; (iii) the constraintlength m0 defined as the number of input k0 -tuples used to determine an output n0 -tuple; and (iv) the encoding function, specified for instance by a k0m0 matrix of 1s and0s for each position of the output n0 -tuple. In our example, k0 = 1 , n0 = 2 , m0 = 3 ,and the encoding function is specified by [1, 0, 1] and [0, 1, 1] (compare to the connectionsthat determine the top and bottom output in Figure 6.2). In our case, the elements ofthe output n0 tuple are serialized into a single sequence that we consider to be the actualencoder output, but there are other possibilities. For instance, we could take the pairx2j1, x2j and map it into an element of a 4-ary channel input symbol.

8/12/2019 PDC_Chap6

5/30

6.3. The Decoder 199

6.3 The Decoder

A maximum likelihood (ML) decoder for the AWGN channel decides for (one of) theencoder output sequence x that maximizes

hy, ci kck2

2 ,

where y is the decoder input sequence and c =Esx . The last term in the above

expression is irrelevant as it is nEs2 , thus the same for all c . Furthermore, finding an xthat maximizeshy, ci = hy, Esxi is the same as finding one that maximizes hy, xi .In the above paragraph, we have introduced a slight abuse of notation. Up to this pointthe inner product and the norm have been defined for vectors of Cn written in columnform, with n being an arbitrary but fixed positive integer. Considering n -tuples in colonform is a standard mathematical practice when matrix operations are involved. (We haveused matrix notation to express the density of Gaussian random vectors.) In coding

theory, people find it more useful to write n -tuples in row form because it saves spacewhen specific examples are give. For instance, it would be impractical to label trellisbranches with column vectors. Furthermore, the length k of the information sequenceand the length n of the encoder output sequence can vary from message to message. Forthese reasons, we prefer referring to them as sequences rather than as k and n -tuples,respectively. The abuse of notation we refer to is that we take inner products and normsof sequences and their length varies as needed.

To find an x that maximizes hx, yi , we could in principle compute hx, yi for all 2ksequences that can be produced by the encoder. This brute-force approach would bequite unpractical. For instance, if k = 100 (which is a relatively modest value for k ),2k = (210)10 which is approximately (103)10 = 1030 . Using this approximation, a VLSI

chip that makes 10

9

inner products per second takes 10

21

seconds to check all possibilities.This is roughly 4 1013 years. The universe is only roughly 2 1010 years old!What we need is a method that finds a maximizing x with a number of operations thatgrows linearly (as opposed to exponentially) in k . We will see that the so-called Viterbialgorithm achieves this.

To describe the Viterbi algorithm (VA), we introduce a fourth way of describing a con-volutional encoder, namely the trellis. The trellis is an unfolded transition diagram thatkeeps track of the passage of time. For our example, if we assume that we start at state(1, 1), that we encode k = 5 source bits and then feed the encoder with the dummybits b0 =b1 = 1 to make it go back to the initial state and thus be ready for the nexttransmission, we obtain the trellis description shown on the top of Figure 6.4.

There is a one-to-one correspondence between an encoder input sequence b , an encoderoutput sequence x , and a path (or state sequence) that starts at the initial state (1, 1)(left state) and ends at the final state (1, 1) (right state) of the trellis. Hence we can referto a path by means of an input sequence, an output sequence or a sequence of states.

8/12/2019 PDC_Chap6

6/30

200 Chapter 6.

The trellis on the top of Figure6.4has edges labeled by the corresponding encoder outputpair. The encoder input symbol is omitted as it is the first component of the next state.To decode using the Viterbi algorithm, we replace the label of each edge with the edgemetric (also called branch metric) computed as follows. The edge with x2j1 = a andx2j =b , where a, b {1} , is assigned the edge metric ay2j1+ by2j. Now if we add upall the edge metrics along a path, we obtain the path metrichx, yi for that path.Example 91. Consider the trellis on the top of Figure 6.4and let the decoder inputsequence by y = (1, 3), (2, 1), (4, 1), (5, 5), (3, 3), (1, 6), (2, 4) . For convenience,we chose the components of y to be integers, but in reality they are real-valued. Alsofor convenience, we use parentheses to group the components of y into pairs (y2j1, y2j)that belong to the same trellis section. The edge metrics are shown on the second trellis(from the top) of Figure6.4. Once again, by adding the edge metric along one path, weobtain thehx, yi for that path.

The problem of finding (one of) the x that maximizeshx, yi is reduced to the problemof finding the path with the largest path metric. Next we give an example of how to do

this and then we explain why it works.Example 92. Our starting point is the the second trellis of Figure6.4, which has beenlabeled with the edge metrics. We construct the third trellis in which every state islabeled with the metric of the surviving path to that state obtained as follows. We usej = 0, 1, . . . , k+ 2 to run over the trellis depth. (The +2 relates to the two dummybits.) Depth j = 0 refers to the initial state (leftmost) and depth j = k + 2 to the finalstate (rightmost). Let j = 0 and to the single state at depth j assign the metric 0 . Letj = 1 and label each of the two states at depth j with the metric of the only subpathto that state. (See the third trellis from the top.) Let j = 2 and label the four states atdepth j with the metric of the only subpath to that state. For instance, the label to thestate (1, 1) at depth j = 2 is obtained by adding the metric of the single state andthe single edge that precedes it, namely

1 =

4 + 3 . From j = 3 on the situation is

more interesting, because now every state can be reached from two previous states. Welabel the state under consideration with the largest of the two subpath metrics to thatstate and make sure to remember to which of the two subpaths it corresponds. In thefigure, we make this distinction by dashing the last edge of the other path. (If we weredoing this by hand we would not need a third trellis. Rather we would label the stateson the second trellis and put a cross on the edges that are dashed on our third trellis.)The subpath with the highest brach metric (the one that has not been dashed) is calledsurvivor. We continue similarly for j = 4, 5, . . . , k + 2 . At depth j = k+ 2 there isonly one state and its label maximizeshx, yi over all paths. By tracing back along thenon-dashed path, we find the maximum likelihood path. From it, we can read out thecorresponding bit sequence. The maximum likelihood path is shown in bold on the fourth

and last trellis of Figure 6.4.

From the above example, it is clear that, starting from the left and working its way tothe right, the Viterbi algorithm visits all states and keeps track of the subpath that has

8/12/2019 PDC_Chap6

7/30

6.3. The Decoder 201

the largest metric to that state. In particular, the algorithm finds the path between theinitial state and the final state that has the largest metric.

The complexity of the Viterbi algorithm is linear in the number of trellis sections, i.e.,in k . Recall that the brute force approach has complexity exponential in k . The savingof the Viterbi algorithm comes from not having to compute the metric of non-survivors.When we prune an edge at depth j , we are in fact eliminating 2kj possible extensionof that edge. The brute force approach computes the metric of those extensions but notthe Viterbi algorithm.

A formal definition of the VA (one that can be programmed on a computer) and a moreformal argument that it finds the path that maximizes hx, yi is given in Appendix6.A.

8/12/2019 PDC_Chap6

8/30

202 Chapter 6.

!"!!!"!

!

!!"!!

!"!

!!"!

!"!!

!!"!

!"!!

!"!!!"!

!

!!"!!

!"!

!!"!

!"!!

!!"!

!"!!

!"!!!"!

!

!!"!!

!"!

!!"!

!"!!

!!"!

!"!!

!"!

!!"!

!"!!

!!"!!

#$%$&

!'

'

!

!!

(

)

!)

*

!*

!+

+(

!(

*

!*

*!,

!!,

!!,

!,

,

,

,!-

-

-

!-

,

,

,

,

(

!(

!(

!* ,

!"!

!!"!

!"!!

!!"!!

#$%$&

!!"!!

!"!

!!"!!

!"!

!!"!

!"!!

!!"!

!

!"!

!!"!

!

!"!

!!"!

!!"!

!"!!

(!

!' *

' (

!+

!!

-

,

!,

'

!"!

!!"!

!"!!

!!"!!

#$%$&

!'

'

!

!!

(

)

!)

*

!*

!+

+(

!(

*

!*

*!,

!!,

!!,

!,

,

,

,!-

-

-

!-

,

,

,

,

(

!(

!(

!* ,

'

'

),

!-

),

),

))

!,

).

)* (!

!' *

' (

!+

!!

-

,

!,

'

!"!

!!"!

!"!!

!!"!!

#$%$&

!'

'

!

!!

(

!)

*

!*

!+

+(

!(

*

!*

*!,

!!,

!!,

!,

,

,

,!-

-

-

!-

,

,

,

,

(

!(

!(

!* ,

'

'

),

!-

),

),

))

!,

).

)*

)

Figure 6.4: The Viterbi algorithm. Top figure: Trellis representing the encoderwhere edges are labeled with the corresponding output symbols. Second figure:

Edges are re-labeled with the edge metric corresponding to the received sequence(1, 3), (2, 1), (4,1), (5, 5), (3,3), (1,6), (2,4) (parentheses have been

inserted to facilitate parsing). Third figure: Each state has been labeled with themetric of a survivor to that state and non-surviving edges are pruned (dashed).Fourth figure: Tracing back from the end, we find the decoded path (bold); it

corresponds to the source sequence 1, 1, 1, 1,1, 1, 1 .

8/12/2019 PDC_Chap6

9/30

6.4. Bit Error Probability 203

6.4 Bit Error Probability

In this section we derive an upper bound to the bit error probability Pb . As usual, we fix asignal and we evaluate the error probability conditioned on this signal being transmitted.

If the result depends on the chosen signal (which is not the case here), then we removethe conditioning by averaging over all signals.

Each signal that can be produced by the transmitter corresponds to a path in the trellis.The path we condition on is referred to as the reference path. We are free to choosethe reference path and, for notational convenience, we choose the all-onepath: it is theone that corresponds to the information sequence being a sequence of k ones with initialencoder state (1, 1). The encoder output is a sequence of 1s of length n= 2k .

The task of the decoder is to find (one of) the paths in the trellis that has the largesthx, yi where x is the encoder output that corresponds to that path. If the decoder doesnot come up with the correct path, it is because it chooses a path that contains one or

more detours.Detours(with respect to the reference path) are trellis path segments that share with thereference path only the starting and the ending state.1 See Figure6.5.

Start 1st detour End 1st detour

End 2nd detourStart 2nd detourReference path

Figure 6.5: Detours.

Errors are produced when the decoder follows a detour. To compute the bit error probabil-ity, we study the random process produced by the decoder when it chooses the maximumlikelihood trellis path. Each such path is either the correct path or else it breaks downin some number of detours. To the path selected by the decoder, we associate a se-quence 0,1, . . . ,k1 defined as follows. If there is a detour that starts at depth j ,j = 0, 1, . . . , k 1, we let j be the number of bit errors produced by that detour. Inall other cases we let j = 0. Then

Pk1j=0j is the number of bits that are incorrectly

decoded and 1kk0

Pk1j=0j is the corresponding fraction of bits ( k0 =1 in our running ex-

ample). Over the ensemble of all possible noise processes, j becomes a random variable

1For an analogy, the reader can think of the trellis as a road map, of the reference path as an intendedroad for a journey, and the path selected by the decoder as the actual road taken during the journey.Due to constructions, occasionally the actual path splits from the intended path to merge again with itat some later point. A detour is the chunk of road from the split to the merge.

8/12/2019 PDC_Chap6

10/30

204 Chapter 6.

j and the bit error probability is

Pb4= E

" 1

kk0

k1Xj=0

j

#=

1

kk0

k1Xj=0

E [j] .

To upper bound the above expression, we need to learn how many detours of a certainkind there are. We do so in the next session.

6.4.1 Counting Detours

In this subsection, we consider the infinite trellis obtained by extending the finite trellis inboth directions. Each path of the infinite trellis corresponds to an infinite input sequenceb= . . . b1, b0, b1, b2, . . . and an infinite output sequence x= . . . x1, x0, x1, x2, . . . . Theseare sequences that belong to {1} .

To each such detour we can associate two numbers, namely the input distance i and the

output distance d . The input distance is the number of positions where the two inputsequences differ over the course of the detour. Likewise, the output distance is the numberof output discrepancies over the course of the detour.

We seek the answer to the following question. For any given reference path and depthj {0, 1, . . . } , what is the number a(i, d) of detours that start at depth j and haveinput distance i and output distance d , with respect to the reference path? This numberdepends neither on j nor on the reference path. It does not depend on j because theencoder is a time-invariant machine, i.e., all the sections of the infinite trellis are identical.We will see that it does not depend on the reference path either, because the encoder islinear in a sense that we will discuss.

Example 93. Using again the top subfigure of Figure6.4, we can verify by inspectionthat for each reference path and each positive integer j there is a single detour that startsat depth j and has parameters i= 1 and d= 5 . Thus a(1, 5) = 1 . We can also verifythat a(2, 5) = 0 and a(2, 6) = 2 .

To determine a(i, d) we choose the all-onepath as the reference. We modify the statediagram into one for which all paths are detours with respect to the reference path. Thisis the detour flow graphshown in Figure6.6. It is obtained by removing from the statediagram the self-loop of state b= (1, 1) and by split opening b to create two new statesdenoted by s (for start) and e (for end). For every j , there is a one-to-one correspondencebetween a detour to the all-one path that starts at depth j and a path between node sand node e of the detour flow graph.

8/12/2019 PDC_Chap6

11/30

8/12/2019 PDC_Chap6

12/30

206 Chapter 6.

From the detour flow graph, we see that the various generating functions are relatedas follows, where to simplify notation we drop the two arguments ( I and D ) of thegenerating functions:

Tl = I D2 +TrI

Tt= Tl ID+Tt ID

Tr =Tl D+Tt D

Te= TrD2.

This system can be solved for Te (hence for T) by pure formal manipulations, like solvinga system of equations. The result is

T(I, D) = ID5

1 2ID .

As we will see shortly, the generating function T(I, D) of a(i, d) is more useful thana(i, d) itself. However, to show that we can indeed obtain a(i, d) from T(I, D) we usethe expansion 11x = 1 +x+x

2 +x3 + to write

T(I, D) = ID5

1 2ID =I D5(1 + 2ID+ (2ID)2 + (2ID)3 +

=I D5 + 2I2D6 + 22I3D7 + 23I4D8 +

This means that there is one path with parameters d = 5 , i = 1, that there are twopaths with d= 6 , i= 2, etc. The general expression for i= 1, 2, . . . is

a(i, d) =(

2i1

, d= i + 40, otherwise.

By means of the detour flow graph, it is straightforward to verify this expression for smallvalues of i and d .

It remains to be shown that a(i, d) (the number of detours that start at any given depthj and have parameter i and d ) does not depend on which refrence path we choose. Wedo this in Exercise7.

6.4.2 Upper Bound to Pb

We are now ready for the final step: the derivation of an upper bound to the bit-errorprobability. We recapitulate.

8/12/2019 PDC_Chap6

13/30


We fix an arbitrary encoder input sequence, let x = x1, x2 . . . , xn be the correspondingencoder output and c=

Esx the channel input sequence. The waveform signal is

w(t) =n

Xj=1cjj(t)

where 1(t), . . . ,n(t) forms an orthonormal collection. We transmit this signal over theAWGN channel with power spectral density N0/2. Let r(t) =w(t) + z(t) be the receivedsignal, where z(t) is a sample path of the noise process Z(t), and let

y= y1, . . . , yn, where yi=

Z r(t)i (t)dt,

be the decoder input.

The Viterbi algorithm labels each edge in the trellis with the corresponding edge metricand finds the path through the trellis with the largest path metric. An edge from depthj

1 to j with output symbols x2j

1, x2j is labeled with the edge metric y2j

1x2j

1+

y2jx2j.

The maximum likelihood path selected by the Viterbi decoder could contain detours. Ifthere is a detour that starts at depth j , j = 0, 1, . . . , k 1, we set j to be the number ofbit errors made on that detour. Otherwise, we set j = 0. Let j be the correspondingrandom variable (over all possible noise realizations).

For the path selected by the Viterbi algorithm, the total number of incorrect bits isPk1j=0j and

1kk0

Pk1j=0j is the fraction of errors with respect to the kk0 source bits.

Hence the bit-error probability is

Pb= 1

kk0

k1

Xj=0

E[j]. (6.1)

The expected value E[j] can be written as follows

E[j] =Xh

i(h)(h), (6.2)

where the sum is over all detours h that start at depth j with respect to the referencepath, (h) stands for the probability that detour h is taken, and i(h) for the inputdistance between detour h and the reference path.

Next we upperbound (h). If a detour starts at depth j and ends at depth l = j +m ,

then the corresponding encoder-output symbols form a 2m tuple u {1}2m . Letu= x2j+1, . . . , x2l {1}2m and = y2j+1, . . . , y2l be the corresponding sub-sequence ofthe reference path and of the channel output, respectively, see Figure 6.7.

8/12/2019 PDC_Chap6

14/30

208 Chapter 6.

Detour starts at depth

Detour ends at depth

l= j +m

ju

u

Figure 6.7: Detour and reference path, labeled with the corresponding output

subsequences.

If the Viterbi algorithm takes this detour, it must be that the subpath metric along the

detour is at least as large as the corresponding subpath metric along the reference path.An equivalent condiditon is that is at least as close to Esu as it is to Esu . Observethat has the statistic of

Esu+Z where Z N(0, In0 N02 ) and n0 is the common length

of u,u , and . The probability that is at least as close toEsu as it is to

Esu is

QdE2

, where dE= 2

Esd is the Euclidean distance between

Esu and

Esu . Using

dE(h) to denote the Euclidean distance of detour h to the reference path, we obtain

(h) Q

dE(h)

2

=Q

s

2Esd(h)

N0

,

where the inequality sign is needed because the event that is at least as close toEsu

as it is to Esu is no guarantee that the Viterbi decoder will take detour u . In fact,there could be another detour even closer to . Inserting the above bound into (6.2) weobtain the first inequality in the following chain.

8/12/2019 PDC_Chap6

15/30


Ej =Xh

i(h)(h)

Xh i(h)Qs2Esd(h)

N0!

(a)=

Xi=1

Xd=1

iQ

r2Esd

N0

!a(i, d)

(b)

Xi=1

Xd=1

iQ

r2Esd

N0

!a(i, d)

(c)

Xi=1

Xd=0

izda(i, d).

To obtain equality (a) we group the terms of the sum that have the same i and d and

introduce a(i, d) to denote the number of such terms in the finite trellis. a(i, d) is thefinite-trellis equivalent to a(i, d) introduced in Section6.4.1. As the infinite trellis containsall the detours of the finite trellis and more, a(i, d) a(i, d). This justifies (b). In (c) weuse

Q

r2Esd

N0

! eEsdN0 =zd, for z= e EsN0 .

For the final step we use the relationship

Xi=1

if(i) =

I

Xi=0

Iif(i)

I=1

,

which holds for any function fand can be verified by taking the derivative ofP

i=0 Iif(i)

with respect to I and then setting I= 1. Hence

E[j ] Xi=1

Xd=0

izda(i, d)

=

I

Xi=1

Xd=0

IiDda(i, d)

I=1,D=z

=

IT(I, D)

I=1,D=z

.

Plugging into (6.1) and using the fact that the above bound does not depend on j yields

Pb= 1

kk0

k1Xj=0

E[j] 1k0

IT(I, D)

I=1,D=z

. (6.3)

8/12/2019 PDC_Chap6

16/30

210 Chapter 6.

In our specific example we have k0= 1 and T(D, I) = ID5

12ID , hence TI

= D5

(12ID)2. Thus

Pb z5

(1 2z)2 .

The bit error probability depends on the encoder and on the channel. Bound ( 6.3) nicelyseparates the two contributions. The encoder is accounted for by T(I, D)/k0 and thechannel by z. More precisely, zd is an upper bound to the probability that a maximumlikelihood receiver makes a decoding error when the choice is between two encoder outputsequences that have Hamming distance d . As shown in Problem35(ii) of Chapter2, wecan use the Bhattacharyya bound to determine zfor any binary-input discrete memorylesschannel. For such a channel,

z=Xy

pP(y|a)P(y|b)

where a and b are the two letters of the input alphabet and y runs over all the elements

of the output alphabet. Hence, the technique used in this chapter is applicable to anybinary input discrete memoryless channel.

6.5 Concluding Remarks

To be specific, in thiese concluding remarks we assume that the waveform former is as inthe previous chapter, i.e., the transmitted signal is of the form

w(t) =n1

Xi=0 ci(t iT)for some pulse (t) that has unit norm and is orthogonal to is T-space translates.

Without coding, cj =bjEs

Es

. Then the relevant design parameters are (thesubscripts s and b stand for symbol and bit, respectively)

Rb= Rs= 1

T [bits/s],

Eb= Es,

Pb= Q

Es

=Q

r2EsN0

! e EsN0 ,

where we have used 2 = N02 and Q(x) exp x22 .

With coding, if we want the transmitted signal to have the same form in both cases, the

8/12/2019 PDC_Chap6

17/30

6.5. Concluding Remarks 211

corresponding parameters are

Rb=Rs

2 =

1

2T [bits/s],

Eb= 2Es,

Pb z5

(1 2z)2 wherez= eEs

N0 .

We see that the encoder halves the bit rate and doubles the energy per bit consumed bythe system. As Eb

N0becomes large, the denominator of the above bound for Pb becomes

essentially 1 and the bound decreases as z5 . The bound for the uncoded case is z.

The above bounds are plotted in Figure15.

0 1 2 3 4 5 6

1010109108107106105104103102101

100

1 3 5

Es/N0 in dB

Pb

z= eEs

N0

Qq

2EsN0

z5

(12z)2truncated

upper

bou

nd

Figure 6.8: Bit error probability: bounds and approximation. The truncated

upper bound is the bound derived in Exercise15.

In both cases the power spectral density is EsT|F(f)|2 (see Exercise1). Hence the band-

width is the same in both cases. Since coding reduces the bit-rate by a factor 2 , the

bandwidth effi

ciency, defined as the number of bits per second per Hz, is smaller by afactor of 2 in the coded case. With a more powerful code, we can further decrease the biterror probability without affecting the other parameters. The price for a more powerfulconventional code is that the constraint length is bigger, which implies more states and,in turn, a higher decoding complexity in terms of the number of operations per bit.

8/12/2019 PDC_Chap6

18/30

212 Chapter 6.

It is instructivce to compare the performance of the chosen code to a trivial code thatsends the information symbol twice. In this case the j th bit is sent as bj

Es(t

(2j 1)T) +bjEs(t2jT) which is the same as sending bj

2Es(t2jT) with

(t) = ((t) + (t T))/2. This is bit-by-bit on a pulse train with the new pulse (t)used every 2T seconds. We know that the bit error probability of bit-by-bit on a pulse

train does not depend on the pulse. In fact, it is the same as the original bit-by-bit on apulse train, with 2E2 instead ofE. If we use the upper bound, which is useful to comparethe three systems, we obtain Pb < z

2 . The bit rate and the energy per bit are the sameas for the convolutionally encoded system.

From a high level point of view, coding is about exploiting the advantages of working in ahigher dimensional signal space. (Bit-by-bit on a pulse train uses one dimension at a time.)In n dimensions, we send some c Rn and receive Y =c+Z, where ZN(0, In2). Bythe law of large numbers,

p(P

Z2i)/n goes to as n goes to infinity. This means thatwith probability approaching 1, the received n -tuple Ywill be in a thin shell of radius

n around c . This phenomenon is referred to as sphere hardening. As n becomeslarge, the region where we expect to find Z becomes more predictable and in fact it

becomes a small fraction of the entire space. Hence, there is hope that we will find manyvector signals that are distinguishable (with high probability) even after the Gaussiannoise has been added. Information theory tells us that we can make the probability oferror go to zero as n goes to infinity, provided that we use fewer than m= 2nC signals,where C= 12log2(1 +

Es2

). It also teaches us that the probability of error cannot be madearbitrarily small if we use more than 2nC signals. Since (log2 m)/n is the number of bitsper dimension that we are sending, when we use m signals embedded in an n dimensionalspace, it is quite appropriate to call C [bits/dimension] the capacityof the discrete-timeadditive white Gaussian noise channel. For the continuous-time AWGN channel of totalbandwidth W, the channel capacity is

W

2

log21 + 2PW N0

[bits/sec],where P is the transmitted power. It can be approached arbitrarily close, with an errorprobability as small as desired, using signals of the form w(t) =

Pjcj(t jT) and

powerful codes.

Appendix 6.A Formal Definition of the Viterbi Algorithm

Let = {(1, 1), (1, 1), (1, 1), (1, 1)} be the state space and define the edge metricj1,j(, ) as follows. If there is an edge that connects state at depth j 1 tostate

at depth j let

j1,j(, ) =x2j1y2j1+x2jy2j,

where x2j1, x2j is the encoder output of the corresponding edge. If there is no such edgewe let j1,j(,) = .

8/12/2019 PDC_Chap6

19/30

6.A. Formal Definition of the Viterbi Algorithm 213

Since j1,j(,) is the j th term inhx, yi for any path that goes through state atdepth j 1 and state at depth j ,hx, yi is obtained by adding the edge metrics alongthe path specified by x .

Thepath metricis the sum of the edge metrics taken along the edges of a path. A longestpath from state (1, 1) at depth j= 0, denoted (1, 1)0 , to a state at depth j , denotedj, is one of the paths that has the largest path metric. The Viterbi algorithm worksby constructing, for each j , a list of the longest paths to the states at depth j . Thefollowing observation is key to understanding the Viterbi algorithm. If path j1 j isa longest path to state of depth j , where path j2 and denotes concatenation,then pathj1 must be a longest path to state of depth j 1, for if another path, sayalternatepath j1 were shorter for some alternatepath j2 , then alternatepath j1 j would be shorter than path j1 j. So the longest depth j path to a statecan be obtained by checking the extension of the longest depth (j 1) paths by onebranch.

The following notation is useful for the formal description of the Viterbi algorithm. Letj() be the metric of a longest path to state j and let Bj()

{1}j be the encoder

input sequence that corresponds to this path. We call Bj() {1}j the survivorbecause it is the only path through state j that will be extended. (Paths through jthat have a smaller metric have no chance of extending into a maximum likelihood path).For each state, the Viterbi algorithms computes two things: a survivor and its metric.The formal algorithm follows, where B(,) is the encoder input that corresponds to thetransition from state to state if there is such a transition and is undefined otherwise.

1. Initially set 0(1, 1) = 0 , 0() = for all 6= (1, 1) ,B0(1, 1) = , and j = 1 .

2. For each

, find one of the for which

j1() +j1,j(,) is a maximum. Then set

j() j1() +j1,j(,),Bj() Bj1() B(,).

3. If j = k + 2 , output the first k bits of Bj(1, 1) andstop. Otherwise increment j by one and go to Step 2.

The reader should have no difficulty verifying (by induction on j ) that j() as computedby Viterbis algorithm is indeed the metric of a longest path from (1 , 1)0 to state atdepth j and that Bj() is the encoder input sequence associated to it.

8/12/2019 PDC_Chap6

20/30

214 Chapter 6.

Appendix 6.B Exercises

Problem 1. (Power Spectral Density) Consider the random process

X(t) = Xi=

XipEs(t iTs T0),

where Ts and Es are fixed positive numbers, (t) is some unit-energy function, T0 isa uniformly distributed random variable taking value in [0, Ts) , and {Xi}i= is theoutput of the convolutional encoder described by

X2n = BnBn2X2n+1 = BnBn1Bn2

with iid input sequence {Bi}i= taking values in {1} .

(a) Express the power spectral density of the above random process for a general (t) .

(b) Plot the power special density for a rectangular pulse (t) of width Ts .

Problem 2. (Power Spectral Density: Correlative Encoding) Repeat Exercise1 usingthe encoder:

Xi= Bi Bi1.Compare this exercise to Exercise4of Chapter5.

Problem3.

(Viterbi Algorithm)An output sequence x1, . . . , x10 from the convolutionalencoder of Figure6.9is transmitted over the discrete-time AWGN channel. The initialand final state of the encoder is (1, 1) . Using the Viterbi algorithm, find the maxi-mum likelihood information sequence b1, . . . , b4, 1, 1 , knowing that b1, . . . , b4 are drawnindependently and uniformly from {1} and that the channel output y1, . . . , y10 =1, 2, 1, 4, 2, 1, 1, 3, 1, 2 . (We are choosing integers for convenience.)

bj1 bj2bj {1}

x2j+1

x2j

Figure 6.9:

8/12/2019 PDC_Chap6

21/30

6.B. Exercises 215

Problem4. (Intersymbol Interference) We speak of intersymbol interference (ISI) whenthe sampled matched-filter output is a linear combination of various channel input symbolsplus noise. ISI can result from the channel impulse response, or can be introduced bythe encoder to shape the transmitted signals spectrum, but in this case we speak ofcorrelative or partial response encoding (see Exercise2). ISI can also be the result of

sampling the matched filter output at the wrong time (see e.g. Section 7.6). To leverageon what we have learned in this chapter, we model the intersymbol interference channelby an encoder followed by the AWGN channel, i.e.,

Yi= Xi+Zi

Xi=LX

j=0

Bijhj, i= 1, 2, . . . (6.4)

where Bi is the i th information bit, h0, . . . , hL are coefficients that describe the inter-symbol interference, andZi is zero-mean, Gaussian, of variance

2 and statistically inde-pendent of everything else. The input/output relationship can be modeled by a trellis, as

we have learned for convolutional encoders, and the ML decision rule can be implementedby the Viterbi algorithm.

(a) Draw the trellis that describes all sequences of the form X1, . . . , X 7 resulting frominformation sequences of the form B1, . . . , B5, 0 , Bi {0, 1} , assuming

hi=

1, i= 02, i= 10, otherwise.

To determine the initial state, you may assume that the preceding information se-quence terminated with 0 . Label the trellis edges with the input/output symbols.

(b) Specify a metricf(x1, . . . , x6, ) =P6

i=1 f(xi, yi) whose minimization or maximizationwith respect to the valid x1, . . . , x6 leads to a maximum likelihood decision. Specifyif your metric needs to be minimized or maximized.

(c) Assume y1, . . . , y6= 2, 0, 1, 1, 0, 1 . Find the maximum likelihood estimate of theinformation sequence B1, . . . , B5 .

Problem 5. (Linearity) In this exercise, we establish in what sense the encoder of Fig-ure6.2is linear.

(a) For this part you might want to review the axioms of a field. (See e.g. K. Hoffmanand R. Kunze, Linear Algebra, Prentice Hall or your favorite linear algebra book.)Consider the set F0 = {0, 1} with the following addition and multiplication tables:

8/12/2019 PDC_Chap6

22/30

216 Chapter 6.

+ 0 10 0 11 1 0

0 10 0 01 0 1

(The addition in F0 is the usual addition over R with result taken modulo 2 . The

multiplication is the usual multiplication over R and there is no need to take themodulo 2 operation because the result is automatically in F0 .) F0 , + , and form a binary field denoted by F2 . Now consider F = {1} and the followingaddition and multiplication tables:

+ 1 11 1 1

1 1 1

1 11 1 1

1 1 1

(The addition in F is the usualmultiplication over R .) Argue that F0 , + ,and form a binary field as well. Hint: The second set and operations can beobtained from the first set via the transformation T : F0

F

that sends 0 to 1

and 1 to1 . Hence, by construction, for a, b F0 , T(a+b) = T(a) + T(b) andT(a b) = T(a) T(b) . Be aware of the double meaning of + and in theprevious sentence.

(b) For this part you might want to review the notion of a vector space. Let F0 , + and be as defined in (a). Let V = F0 . This is the set of infinite sequencestaking values in F0 . DoesV, F0 , + and form a vector space? (Addition ofvectors and multiplication of a vector with a scalar is done component-wise.) RepeatusingF .

(c) For this part you might want to review the notion of linear transformation. Letf :V

Vbe the transformation that sends an infinite sequence b

Vto an infinite

sequence x V according tox2j1 = bj1+bj2+bj3

x2j = bj+ bj2,

where the + is the one defined over the field of scalars implicit in V. Argue thatthis f is linear. Comment: When V=F , this encoder is the one used throughoutChapter6, with the only difference that in the chapter we multiply overR rather thanadding overF , but this is just a matter of notation, the result of the two operationson the elements ofF being identical. The standard way to describe a convolutionalencoder is to chooseF0 and the corresponding addition, namely addition modulo 2 .See Problem6for the reason we opt for a non-standard description.

Problem 6. (Standard Description of a Convolutional Encoder) Figure 6.10(a) showsthe standard way to describe the convolutional encoder we use as a case study in Chapter

8/12/2019 PDC_Chap6

23/30

6.B. Exercises 217

6. Its input and output symbols are elements of F0 ={0, 1} and addition is modulo 2 .Sending a scaled version of the output symbols, scaled so that the average energy is Es ,means using the channel input alphabet {0,

2Es} . We are more energy efficient if we

use the channel input alphabet {Es} . Towards that end, we map the encoder outut

symbols intoF= {1} by means of the map T that sends 0 to 1 and 1 to1 , alsosown in the figure. Show that the mapped encoder output of Figure6.10(a) is identicalto the output of Figure6.10(b), where bj = T(bj) {1} . Hint: For a and b in F2 ,T(a +b) =T(a)T(b) .

bj1 bj2bj {0, 1}

+ +

+ T

Tx2j

x2j1

(a) Conventional description. Addition is modulo 2 .

bj1 bj2bj {1}

x2j

x2j1

(b) Description used in this text. Multiplication is over R .

Figure 6.10:

Comment: the encoder of Figure 6.10(b) is linear over the field F

(see Exercise 5),whereas the encoder of Figure6.10(a) is linearity overF0 only if we omit the output mapT.

Problem7. (Independence of the Distance Profile from the Reference Path) We want toshow that a(i, d) does not depend on the reference path. Recall that in Section 6.4.1wedefine d(i, d) as the number of detours that leave the reference path at some arbitrary,but fixed trellis depth j and have input distance i and output distance d with respectto the reference path.

(a) Let b and b , both in {1} , be two input sequences to the encoder of Figure6.2and

let fbe the encoding map. The encoder is linear in the sense that the componentwiseproduct over the reals bb is also a valid input sequence and the corresponding outputsequence isf(bb) =f(b)f(b) (see Exercise5). Argue that the distance between b andb equals the distance between bb and the all-one input sequence. Similarly, argue

8/12/2019 PDC_Chap6

24/30

218 Chapter 6.

that the distance between f(b) and f(b) equals the distance between f(bb) and theall-one output sequence (which is the output to the all-one input sequence).

(b) Fix an arbitrary reference path and an arbitrary detour that splits from the referencepath at time 0 . Let b and b be the corresponding input sequences. Because thethe detour starts at time 0 , b

i= b

i for i < 0 and b

06= b

0. Argue that b uniquely

defines a detour b that splits from the all-one path at time 0 and such that:

(i) the distance between b and b is the same as that between b and the all-oneinput sequence.

(ii) the distance between f(b) and f(b) is the same as that between f(b) and theall-one output sequence.

(c) Conclude that a(i, d) does not depend on the reference path.

Problem 8. (Rate 1/3 Convolutional Code.) For the convolutional encoder of Fig-

ure6.11:

bn1 bn2bn {1}

x3n= bnbn2

x3n+1= bn1bn2

x3n+2= bnbn1bn2

Figure 6.11:

(a) Draw the state diagram and the detour flow graph.

(b) Suppose that the serialized encoder output symbols are scaled so that the resultingenergy per bit is Eb and send over the AWGN channel of noise variance 2 =N0/2 .Derive an upper bound to the bit error probability assuming that the decoder imple-ments the Viterbi algorithm.

Problem9. (Rate 2/3 Convolutional Code)The following equations describe the outputsequence of a convolutional encoder that in each epoch takes k0= 2 input symbols from{1} and outputs n0 = 3 symbols from the same alphabet.

x3n = b2nb2n1b2n2x3n+1 = b2n+1b2n2x3n+2 = b2n+1b2nb2n2

8/12/2019 PDC_Chap6

25/30

6.B. Exercises 219

(a) Draw an implementation of the encoder based on delay elements and multipliers.

(b) Draw the state diagram.

(c) Suppose that the serialized encoder output symbols are scaled so that the resultingenergy per bit is Eb and send over the AWGN channel of noise variance 2 =N0/2 .

Derive an upper bound to the bit error probability assuming that the decoder imple-ments the Viterbi algorithm.

Problem10. (Convolutional Encoder, Decoder and Error Probability) For the convolu-tional code described by the state diagram of Figure 6.12:

(a) Draw the encoder.

(b) As a function of the energy per bit Eb , upper bound the bit error probability of theViterbi algorithm when the scaled encoder output sequence is transmitted over the

discrete-time AWGN channel of noise variance 2

=N0/2 .

1 | 1, !1

1,!1

!1 |

!1,!

1

!1 | 1, 1

!1,1

1 | 1, 1

!1 | !1, 1

1,1

!1,!11 | !1, !1

1 | !1, 1!1 | 1, !1

Figure 6.12:

Problem 11. (Trellis with Antipodal Signals) Figure6.13(a) shows a trellis section la-beled with the output symbolsx2j1, x2j of a convolutional encoder. Notice how branchesthat are the mirror-image of each other have antipodal output symbols (symbols that are

8/12/2019 PDC_Chap6

26/30

220 Chapter 6.

the negative of each other). The purpose of this exercise is to see that when the trel-lis has this particular structure and codewords are sent through the AWGN channel,the maximum likelihood sequence detector further simplifies (with respect to the Viterbialgorithm).

+11, 1

1,1

1 1, 11

, 1

j 1 j

(a)

+1 a

b

1 ab

j 1 j

c

d

c

d

j+ 1

(b)

Figure 6.13:

Figure 6.13(b) shows two consecutive trellis sections labeled with the branch metric.Notice that the mirror symmetry of Figure (a) implies the same kind of symmetry forFigure (b). The maximum likelihood path is the one that has the largest path metric. Toavoid irrelevant complications we assume that there is only one path that maximizes thepath metric.

(a) Let j {1} be the state visited by the maximum likelihood path at depth j .Suppose that a genie informs the decoder that j1 = j+1 = 1 . Write down thenecessary and sufficient condition for j = 1 .

(b) Repeat for the remaining three possibilities of j1 and j+1 . Does the necessaryand sufficient condition for j = 1 depend on the value of j1 and j+1 ?

(c) The brach metric for the branch with output symbols x2j1, x2j is x2j1y2j1 +x2jy2j , where yj is xj plus noise. Using the result of the previous part, spec-ify a maximum likelihood sequence decision for j = 1 based on the observationy2j1, y2j, y2j+1, y2j+2 .

Problem 12. (Viterbi for the Binary Erasure Channel) Consider the convolutional en-coder of Figure6.14with inputs and outputs over {0, 1} and addition modulo 2 . Itsoutput is sent over the the binary erasure channnel described by

PY|X(0|0) =PY|X(1|1) = 1 ,PY|X(?|0) =PY|X(?|1) =

PY|X(1|0) =PY|X(0|1) = 0.

8/12/2019 PDC_Chap6

27/30

6.B. Exercises 221

bj1 bj2

+

+

bj {0, 1}

x2j =bj+bj2

x2j1 = bj1+bj2

Figure 6.14:

(a) Draw a trellis section that describes the encoder map.

(b) Derive the branch metric and specify whether a maximum likelihood decoder choosesthe path with largest or smallest path metric.

(c) Suppose that the initial encoder state is (0, 0) and that the channel output is{0, ?, ?, 1, 0, 1} . What is the most likely information sequence?

(d) Derive an upper bound to the bit error probability.

Problem 13. (Sampling Error) A transmitter sends

X(t) =X

Bi (t iT),

where {Bi}i= , Bi{1, 1} , is a sequence of independent and uniformly distributedbits and(t) is a centered and unit-energy rectangular pulse of width T. The communi-cation channel between the transmitter and the receiver is the AWGN channel of powerspectral density N02 . At the receiver, the channel output Z(t) is passed through a filtermatched to (t) , and the output is sampled, ideally at times tk =kT, k integer.

(a) Consider that there is a timing error, i.e., the sampling time is tk = kT where

T = 0.25 . Ignoring the noise, express the matched filter output observation wk at

time tk = kT as a function of the bit values bk and bk1 .(b) Extending to the noisy case, let rk = wk +zk be the k -th matched filter output

observation. The receiver is not aware of the timing error. Compute the resultingerror probability.

(c) Now assume that the receiver knows the timing error (same as above) but itcan not correct for it. (This could be the case if the timing error becomes knownonce the samples are taken.) Draw and label four sections of a trellis that describesthe noise-free sampled matched filter output for each input sequence b1, b2, b3, b4 . Inyour trellis, take into consideration the fact that the matched filter is at rest beforex(t) =

P4i=1 bi (t iT) enters the filter.

8/12/2019 PDC_Chap6

28/30

222 Chapter 6.

(d) Suppose that the sampled matched filter output consists of 2, 0.5, 0, 1 . Use theViterbi algorithm to decide on the transmitted bit sequence.

Problem 14. (Simulation) The purpose of this exercise is to determine, by simulation,

the bit error probability of the communication system studied in this chapter. For thesimulation, we recommend usingMATLAB, as it has high-level functions for the varioustasks, notably for generating a random information sequence, for doing convolutionalencoding, for simulating the discrete-time AWGN channel, and for decoding by means ofthe Viterbi algorithm. Although the actual simulation is on the discrete-time AWGN, wespecify a continuous-time setup. It is part of your task to translate the continuous-timespecifications into what you need for the simulation. We begin with the uncoded versionof the system of interest.

(a) By simulation, determine the minimum obtainable bit error probabilityPb of bit-by-bit on a pulse train transmitted over the AWGN channel. Specifically, the channelinput signal has the form

X(t) =Xj

Xj(t jT),

where the symbols are iid and take value in {Es} , the pulse (t) has unit norm

and is orthogonal to its T-fold translates. Plot Pe as function of (Es/N0) in therange from 1 to 6 dB, where N0/2 is the noise power spectral density. Verify yourresults with Figure 6.8. Hint: the following are usefulMATLAB functions: To beadded.

(b) Repeat with the symbol sequence being the output of the convolutional encoder ofFigure6.2multiplied by

Es . The decoder shall implement the Viterbi algorithm.

Also in this case you can verify your results by comparing with Figure6.8.

Problem 15. (Bit Error Probability) In the process of upper bounding the bit errorprobability, in Section6.4.2we make the following step

EjXi=1

Xd=1

iQ

r2Esd

N0

!a(i, d)

Xi=1

Xd=0

izda(i, d).

(a) Instead of upper bounding the Q -function as done above, use the results of Sec-tion6.4.1to substitute a(i, d) and d with explicit functions of i and get rid of thesecond sum. You should obtain

PbXi=1

iQ

s2Es(i + 4)

N0

2i1.

8/12/2019 PDC_Chap6

29/30

6.B. Exercises 223

(b) Truncate the above sum to the first 5 terms and evaluate it numerically for (Es/N0)between 1 and 6 dB. Plot the results and compare to Figure6.8.

8/12/2019 PDC_Chap6

30/30

224 Chapter 6.

PDC_Chap6

Documents