ECEN 5682 Theory and Practice of Error Control Codes

Convolutional Codes

ECEN 5682 Theory and Practice of Error ControlCodes

Convolutional Codes

Peter Mathys

University of Colorado

Spring 2007

Peter Mathys ECEN 5682 Theory and Practice of Error Control Codes

Convolutional CodesBasic Definitions, Convolutional EncodersEncoder State DiagramsViterbi Decoding Algorithm

Basic Definitions, Convolutional Encoders

Linear (n, k) block codes take k data symbols at a time and encodethem into n code symbols. Long data sequences are broken up intoblocks of k symbols and each block is encoded independently of allothers. Convolutional encoders, on the other hand, convert anentire data sequence, regardless of its length, into a single codesequence by using convolution and multiplexing operations. Ingeneral, it is convenient to assume that both the data sequences(u0, u1, . . .) and the code sequences (c0, c1, . . .) are semi-infinitesequences and to express them in the form of a power series.



Definition: The power series associated with the data sequenceu = (u0, u1, u2, . . .) is defined as

u(D) = u0 + u1 D + u2 D2 + . . . =∞∑i=0

ui Di ,

where u(D) is called the data power series. Similarly, the codepower series c(D) associated with the code sequencec = (c0, c1, c2, . . .) is defined as

c(D) = c0 + c1 D + c2 D2 + . . . =∞∑i=0

ci Di .

The indeterminate D has the meaning of delay, similar to z−1 inthe z transform, and D is sometimes called the delay operator.



A general rate R = k/n convolutional encoder converts k datasequences into n code sequences using a k × n transfer functionmatrix G(D) as shown in the following figure.

Demux

ConvolutionalEncoderG(D)

......

Mux

Fig.1 Block Diagram of a k-Input, n-Output Convolutional Encoder

u(D)

u(1)(D)

u(k)(D)

c(1)(D)

c(n)(D)

c(D)



The data power series u(D) is split up into k subsequences,denoted u(1)(D), u(2)(D), . . . , u(k)(D) in power series notation,using a demultiplexer whose details are shown in the figure below.

u(D) −→ (u0, u1, . . . , uk−1, uk, uk+1, . . . , u2k−1, u2k, u2k+1, . . . , u3k−1, . . .)

u(1)0 u

(1)1 u

(1)2

u(2)0 u

(2)1 u

(2)2

u(k)0 u

(k)1 u

(k)2

. . . . . . . . . . . . −→ u(1)(D)

. . . . . . . . . . . . −→ u(2)(D)...

. . . . . . . . . −→ u(k)(D)

Fig.2 Demultiplexing from u(D) into u(1)(D), . . . , u(k)(D)



The code subsequences, denoted by c(1)(D), c(2)(D), . . . , c(n)(D)in power series notation, at the output of the convolutionalencoder are multiplexed into a single power series c(D) fortransmission over a channel, as shown below.

→ c(D)(c0, c1, . . . , cn−1, cn, cn+1, . . . , c2n−1, c2n, c2n+1, . . . , c3n−1, . . .)

c(1)0 c

(1)1 c

(1)2

c(2)0 c

(2)1 c

(2)2

c(n)0 c

(n)1 c

(n)2

. . . . . . . . . . . .c(1)(D) →

. . . . . . . . . . . .c(2)(D) →...

. . . . . . . . .c(n)(D) →

Fig.3 Multiplexing of c(1)(D), . . . , c(n)(D) into Single Output c(D)



Definition: A q-ary generator polynomial of degree m is apolynomial in D of the form

g(D) = g0 + g1 D + g2 D2 + . . . + gm Dm =m∑

i=0

gi Di ,

with m + 1 q-ary coefficients gi . The degree m is also called thememory order of g(x).



Consider computing the product (using modulo q arithmetic)

c(D) = u(D) g(D) .

Written out, this looks as follows

c0 + c1 D + c2 D2 + . . . =

= (g0 + g1 D + g2 D2 + . . . + gm Dm) (u0 + u1 D + u2 D2 + . . .) =

= g0u0 + g0u1 D + g0u2 D2 + . . . + g0um Dm + g0um+1 Dm+1 + g0um+2 Dm+2 + . . .

+ g1u0 D + g1u1 D2 + . . . + g1um−1 Dm + g1um Dm+1 + g1um+1 Dm+2 + . . .

+ g2u0 D2 + . . . + g2um−2 Dm + g2um−1 Dm+1 + g2um Dm+2 + . . ....

......

+ gmu0 Dm + gmu1 Dm+1 + gmu2 Dm+2 + . . .

Thus, the coefficients of c(D) are

cj =mX

i=0

gi uj−i , j = 0, 1, 2, . . . , where u` = 0 for ` < 0 ,

i.e., the code sequence (c0, c1, c2, . . .) is the convolution of thedata sequence (u0, u1, u2, . . .) with the generator sequence(g0, g1, . . . , gm).



A convenient way to implement the convolution

cj =m∑

i=0

gi uj−i , j = 0, 1, 2, . . . , where u` = 0 for ` < 0 ,

is to use a shift register with m memory cells (cleared to zero attime t = 0), as shown in following figure.

• · · ·

g0 g1 g2 gm

+ + +· · ·

m memory cells

Fig.4 Block Diagram for Convolution of u(D) with g(D)

. . . , u2, u1, u0

. . . , c2, c1, c0



A general k-input, n-output convolutional encoder consists of ksuch shift registers, each of which is connected to the outputs vian generator polynomials.

Definition: A q-ary linear and time-invariant convolutional encoderwith k inputs and n outputs is specified by a k × n matrix G(D),called transfer function matrix, which consists of generator

polynomials g(`)h (D), h = 1, 2, . . . , k, ` = 1, 2, . . . , n, as follows

G(D) =

g

(1)1 (D) g

(2)1 (D) . . . g

(n)1 (D)

g(1)2 (D) g

(2)2 (D) . . . g

(n)2 (D)

......

...

g(1)k (D) g

(2)k (D) . . . g

(n)k (D)

.

The generator polynomials have q-ary coefficients, degree mh`, andare of the form

g(`)h (D) = g

(`)0h + g

(`)1h D + g

(`)2h D2 + . . . + g

(`)mh`h

Dmh` .



Define the power series vectors

u(D) = [u(1)(D), u(2)(D), . . . , u(k)(D)] ,

c(D) = [c(1)(D), c(2)(D), . . . , c(n)(D)] .

The operation of a k-input n-output convolutional encoder canthen be concisely expressed as c(D) = u(D)G(D). Each individualoutput sequence is obtained as

c(`)(D) =k∑

h=1

u(h)(D) g(`)h (D) .

Note: By setting u(h)(D) = 1 in the above equation, it is easilyseen that the generator sequence

(g(`)0h , g

(`)1h , g

(`)2h , . . . , g

(`)mh`h

) ,

is the unit impulse response from input h to output ` of theconvolutional encoder.



Definition: The total memory M of a convolutional encoder is thetotal number of memory elements in the encoder, i.e.,

M =k∑

h=1

max1≤`≤n

mh` .

Note that max1≤`≤n mh` is the number of memory cells or thememory order of the shift register for the input with index h.

Definition: The maximal memory order m of a convolutionalencoder is the length of the longest input shift register, i.e.,

m = max1≤h≤k

max1≤`≤n

mh` .

Equivalently, m is equal to the highest degree of any of thegenerator polynomials in G(D).



Definition: The constraint length K of a convolutional encoder isthe maximum number of symbols in a single output stream thatcan be affected by any input symbol, i.e.,

K = 1 + m = 1 + max1≤h≤k

max1≤`≤n

mh` .

Note: This definition for constraint length is not in universal use.Some authors define constraint length to be the maximum numberof symbols in all output streams that can be affected by any inputsymbol, which is nK in the notation used here.



Example: Encoder #1. Binary rate R = 1/2 encoder withconstraint length K = 3 and transfer function matrix

G(D) =[g (1)(D) g (2)(D)

]=

[1 + D2 1 + D + D2

].

A block diagram for this encoder is shown in the figure below.

•

+ +

+

Fig.5 Binary Rate 1/2 Convolutional Encoder with K = 3

. . . , u2, u1, u0

. . . , c(2)2 , c

(2)1 , c

(2)0

. . . , c(1)2 , c

(1)1 , c

(1)0



At time t = 0 the contents of the two memory cells are assumed tobe zero. Using this encoder, the data sequence

u = (u0, u1, u2, . . .) = (1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0 . . .) ,

for example, is encoded as follows

u = 110100111010...

uD2 = 110100111010...-----------------

c (1) = 111001110100.....

u = 110100111010...uD = 110100111010...

uD2 = 110100111010...-----------------

c (2) = 100011101001.....

After multiplexing this becomes

c = (c0 c1, c2 c3, c4 c5, . . .) = (c(1)0 c

(2)0 , c

(1)1 c

(2)1 , c

(1)2 c

(2)2 , . . .)

= (11, 10, 10, 00, 01, 11, 11, 10, 01, 10, 00, 01, . . .) .

The pairs of code symbols that each data symbol generates arecalled code frames.



Definition: Consider a rate R = k/n convolutional encoder, let

u = (u0 u1 . . . uk−1, uk uk+1 . . . u2k−1, u2k u2k+1 . . . u3k−1, . . .)

= (u(1)0 u

(2)0 . . . u

(k)0 , u

(1)1 u

(2)1 . . . u

(k)1 , u

(1)2 u

(2)2 . . . u

(k)2 , . . .) ,

and let

c = (c0 c1 . . . cn−1, cn cn+1 . . . c2n−1, c2n c2n+1 . . . c3n−1, . . .)

= (c(1)0 c

(2)0 . . . c

(n)0 , c

(1)1 c

(2)1 . . . c

(n)1 , c

(1)2 c

(2)2 . . . c

(n)2 , . . .) .

Then the set of data symbols (uik uik+1 . . . u(i+1)k−1) is called thei-th data frame and the corresponding set of code symbols(cin cin+1 . . . c(i+1)n−1) is called the i-th code frame fori = 0, 1, 2, . . . .



Example: Encoder #2. Binary rate R = 2/3 encoder with constraint lengthK = 2 and transfer function matrix

G(D) =

"g

(1)1 (D) g

(2)1 (D) g

(3)1 (D)

g(1)2 (D) g

(2)2 (D) g

(3)2 (D)

#=

"1 + D D 1 + D

D 1 1

#A block diagram for this encoder is shown in the figure below.

•

•

• •

+

+ +

+

+

Fig.6 Binary Rate 2/3 Convolutional Encoder with K = 2

. . . , u(1)2 , u

(1)1 , u

(1)0

. . . , u(2)2 , u

(2)1 , u

(2)0

. . . , c(1)2 , c

(1)1 , c

(1)0

. . . , c(2)2 , c

(2)1 , c

(2)0

. . . , c(3)2 , c

(3)1 , c

(3)0



In this case the data sequence

u = (u0 u1, u2 u3, u4 u5, . . .) = (11, 01, 00, 11, 10, 10, . . .) ,

is first demultiplexed into u(1) = (1, 0, 0, 1, 1, 1, . . .) andu(2) = (1, 1, 0, 1, 0, 0, . . .), and then encoded as follows

u(1) = 100111...

u(1)D = 100111...

u(2)D = 110100...----------

c (1) = 101110....

u(1)D = 100111...

u(2) = 110100...----------

c (2) = 100111....

u(1) = 100111...

u(1)D = 100111...

u(2) = 110100...----------

c (3) = 000000....

Multiplexing the code sequences c(1), c(2), and c(3) yields thesingle code sequence

c = (c0 c1 c2, c3 c4 c5, . . .) = (c(1)0 c

(2)0 c

(3)0 , c

(1)1 c

(2)1 c

(3)1 , . . .)

= (110, 000, 100, 110, 110, 010, . . .) .

Because this is a rate 2/3 encoder, data frames of length 2 areencoded into code frames of length 3.



Definition: Let u = (u0, u1, u2, . . .) be a data sequence (beforedemultiplexing) and let c = (c0, c1, c2, . . .) be the correspondingcode sequence (after multiplexing). Then, in analogy to blockcodes, the generator matrix G of a convolutional encoder is definedsuch that

c = uG .

Note that G for a convolutional encoder has infinitely many rowsand columns.

Let G(D) = [g(`)h ] be the transfer function matrix of a

convolutional encoder with generator polynomials

g(`)h (D) =

∑mi=0 g

(`)ih D i , h = 1, 2, . . . , k, ` = 1, 2, . . . , n, where m

is the maximal memory order of the encoder. Define the matrices

Gi =

266664g

(1)i1 g

(2)i1 . . . g

(n)i1

g(1)i2 g

(2)i2 . . . g

(n)i2

......

...

g(1)ik g

(2)ik . . . g

(n)ik

377775 , i = 0, 1, 2, . . . , m .



In terms of these matrices, the generator matrix G can beconveniently expressed as (all entries below the diagonal are zero)

G =

G0 G1 G2 . . . Gm 0 0 . . .G0 G1 . . . Gm−1 Gm 0 . . .

G0 . . . Gm−2 Gm−1 Gm . . .. . .

......

...G0 G1 G2 . . .

G0 G1 . . .G0 . . .

. . .

.

Note that the first row of this matrix is the unit impulse response(after multiplexing the outputs) from input stream 1, the secondrow is the unit impulse response (after multiplexing the outputs)from input stream 2, etc.



Example: Encoder #1 has m = 2,

G0 =[1 1

], G1 =

[0 1

], G2 =

[1 1

],

and thus generator matrix

G =

11 01 11 00 00 00 . . .00 11 01 11 00 00 . . .00 00 11 01 11 00 . . .00 00 00 11 01 11 . . ....

......

......

...

.

Using this, it is easy to compute, for example, the list of (non-zero)datawords and corresponding codewords shown on the next page.



u = (u0, u1, . . .) c = (c0 c1, c2 c3, . . .)

1,0,0,0,0,... 11,01,11,00,00,00,00,...1,1,0,0,0,... 11,10,10,11,00,00,00,...1,0,1,0,0,... 11,01,00,01,11,00,00,...1,1,1,0,0,... 11,10,01,10,11,00,00,...1,0,0,1,0,... 11,01,11,11,01,11,00,...1,1,0,1,0,... 11,10,10,00,01,11,00,...1,0,1,1,0,... 11,01,00,10,10,11,00,...1,1,1,1,0,... 11,10,01,01,10,11,00,...

One thing that can be deduced from this list is that most likely theminimum weight of any non-zero codeword is 5, and thus, becauseconvolutional codes are linear, the minimum distance, calledminimum free distance for convolutional codes for historicalreasons, is dfree = 5.



Example: Encoder #2 has m = 1,

G0 =

[1 0 10 1 1

], G1 =

[1 1 11 0 0

],

and therefore generator matrix

G =

101 111 000 000 000 . . .011 100 000 000 000 . . .000 101 111 000 000 . . .000 011 100 000 000 . . .000 000 101 111 000 . . .000 000 011 100 000 . . .000 000 000 101 111 . . .000 000 000 011 100 . . ....

......

......

.



The first few non-zero codewords that this encoder produces are

u = (u0 u1, . . .) c = (c0 c1 c2, . . .)

10,00,00,... 101,111,000,000,...01,00,00,... 011,100,000,000,...11,00,00,... 110,011,000,000,...10,10,00,... 101,010,111,000,...01,10,00,... 011,001,111,000,...11,10,00,... 110,110,111,000,...10,01,00,... 101,100,100,000,...01,01,00,... 011,111,100,000,...11,01,00,... 110,000,100,000,...10,11,00,... 101,001,011,000,...01,11,00,... 011,010,011,000,...11,11,00,... 110,101,011,000,...



Definition: The code generated by a q-ary convolutional encoderwith transfer function matrix G(D) is the set of all vectors ofsemi-infinite sequences of encoded symbols c(D) = u(D)G(D),where u(D) is any vector of q-ary data sequences.

Definition: Two convolutional encoders with transfer functionmatrices G1(D) and G2(D) are said to be equivalent if theygenerate the same codes.

Definition: A systematic convolutional encoder is a convolutionalencoder whose codewords have the property that each data frameappears unaltered in the first k positions of the first code framethat it affects.

Note: When dealing with convolutional codes and encoders it isimportant to carefully distinguish between the properties of thecode (e.g., the minimum distance of a code) and the properties ofthe encoder (e.g., whether an encoder is systematic or not).



Example: Neither encoder #1 nor encoder #2 are systematic. Butthe following binary rate 1/3 encoder, which will be called encoder#3, with constraint length K = 4 and transfer function matrix

G(D) =[1 1 + D + D3 1 + D + D2 + D3

],

is a systematic convolutional encoder. Its generator matrix is

G =

111 011 001 011 000 000 000 . . .000 111 011 001 011 000 000 . . .000 000 111 011 001 011 000 . . .000 000 000 111 011 001 011 . . ....

......

......

......

.

Note that the first column of each triplet of columns has only asingle 1 in it, so that the first symbol in each code frame is thecorresponding data symbol from the data sequence u.



Much more interesting systematic encoders can be obtained if oneallows not only FIR (finite impulse response), but also IIR (infiniteimpulse response) filters in the encoder. In terms of the transferfunction matrix G(D), this means that the use of rationalpolynomial expressions instead of generator polynomials as matrixelements is allowed. The following example illustrates this.

Example: Encoder #4. Binary rate R = 1/3 systematic encoderwith constraint length K = 4 and rational transfer function matrix

G(D) =

[1

1 + D + D3

1 + D2 + D3

1 + D + D2 + D3

1 + D2 + D3

].

A block diagram of this encoder is shown in the next figure.



G(D) =

»1

1 + D + D3

1 + D2 + D3

1 + D + D2 + D3

1 + D2 + D3

–.

+• •

+

+ + •

+

Fig.7 Binary Rate 1/3 Systematic Convolutional Encoder with K = 4

u(D)

c(1)(D)

c(2)(D)

c(3)(D)



Encoder State Diagrams

Convolutional encoders have total memory M. Thus, atime-invariant q-ary encoder can be regarded as a finite statemachine (FSM) with qM states and it can be completely describedby a state transition diagram called encoder state diagram. Such astate diagram can be used to encode a data sequence of arbitrarylength. In addition, the encoder state diagram can also be used toobtain important information about the performance of aconvolutional code and its associated encoder.



Example: Encoder state diagram for encoder #1. This is a binary encoder withG(D) = [1 + D21 + D + D2] that uses 2 memory cells and 22 = 4 states. Withreference to the block diagram in Figure 5, label the encoder states as follows:

S0 = 00 , S1 = 10 , S2 = 01 , S3 = 11 ,

where the first binary digit corresponds to the content of the first (leftmost)delay cell of the encoder, and the second digit corresponds to the content ofthe second delay cell.

At any given time t (measured in frames), the encoder is in a particular state

S (t). The next state, S (t+1), at time t + 1 depends on the value of the data

frame at time t, which in the case of a rate R = 1/2 encoder is just simply ut .

The code frame c(1)t c

(2)t that the encoder outputs at time t depends only on

S (t) and ut (and the transfer function matrix G(D), of course). Thus, the

possible transitions between the states are labeled with ut/c(1)t c

(2)t . The

resulting encoder state diagram for encoder #1 is shown in the following figure.



S0

S1

S2

S3

Fig.8 Encoder State Diagram for Binary Rate 1/2 Encoder with K = 3

1/11 1/10

0/100/11

0/011/000/00 1/01

To encode the data sequence u = (0, 1, 0, 1, 1, 1, 0, 0, 1, . . .), for instance, startin S0 at t = 0, return to S0 at t = 1 because u0 = 0, then move on to S1 att = 2, S2 at t = 3, S1 at t = 4, S3 at t = 5, S3 at t = 6 (self loop around S3),S2 at t = 7, S0 at t = 8, and finally S1 at t = 9. The resulting code sequence(after multiplexing) is

c = (00, 11, 01, 00, 10, 01, 10, 11, 11, . . .) .



Example: Encoder state diagram for encoder #2 with

G(D) =

[1 + D D 1 + D

D 1 1

]

and block diagram as shown in Figure 6. This encoder also hasM = 2, but each of the two memory cells receives its input at timet from a different data stream. The following convention is used tolabel the 4 possible states (the upper bit corresponds to the uppermemory cell in Figure 6)

S0 =00

, S1 =10

, S2 =01

, S3 =11

.

Because the encoder has rate R = 2/3, the transitions in theencoder state diagram from time t to time t + 1 are now labeled

with u(1)t u

(2)t /c

(1)t c

(2)t c

(3)t . The result is shown in the next figure.



S0

S1

S2

S3

Fig.9 Encoder State Diagram for Binary Rate 2/3 Encoder with K = 2

10/101 11/001

01/00000/100

01/100

10/00111/110

00/011

01/011 11/010

10/11000/111

00/000

10/010

11/101

01/111



Example: The figure on the next slide shows the encoder statediagram for encoder #4 whose block diagram was given in Figure7. This encoder has rational transfer function matrix

G(D) =

[1

1 + D + D3

1 + D2 + D3

1 + D + D2 + D3

1 + D2 + D3

],

and M = 3. The encoder states are labeled using the followingconvention (the leftmost bit corresponds to the leftmost memorycell in Figure 7)

S0 = 000 , S1 = 100 , S2 = 010 , S3 = 110 ,S4 = 001 , S5 = 101 , S6 = 011 , S7 = 111 .



S0

S1

S2 S5

S3

S7

S6S4

Fig.10 Encoder State Diagram for R = 1/3, K = 4 Systematic Encoder

1/111

1/100

0/011

1/101

0/001

1/1100/000

1/111

0/011

1/101

0/010

0/001

0/010

1/1000/000

1/110



Trellis Diagrams

Because the convolutional encoders considered here aretime-invariant, the encoder state diagram describes their behaviorfor all times t. But sometimes, e.g., for decoding convolutionalcodes, it is convenient to show all possible states of an encoderseparately for each time t (measured in frames), together with allpossible transitions from states at time t to states at time t + 1.The resulting diagram is called a trellis diagram.

Example: For encoder #1 with G(D) =[1 + D2 1 + D + D2

]and M = 2 (and thus 4 states) the trellis diagram is shown in thefigure on the next slide.



• • • • • •

• •

•

•

•

•

•

•

•

•

•

•

•

S0

S1

S2

S3

t = 0 t = 1 t = 2 t = 3 t = 4 t = 5

· · ·

Fig.11 Trellis Diagram for Binary Rate 1/2 Encoder with K = 3

00 00 00 00 00

11 11 11 11 11

01 01 01 0110 10 10 10

11 11 1100 00 00

10 10 10

01 01 01

11

10

10

00

01



Note that the trellis always starts with the all-zero state S0 at timet = 0 as the root node. This corresponds to the convention thatconvolutional encoders must be initialized to the all-zero statebefore they are first used. The labels on the branches are the codeframes that the encoder outputs when that particular transitionfrom a state at time t to a state at time t + 1 is made in responseto a data symbol ut . The highlighted path in Figure 11, forexample, coresponds to the data sequence u = (1, 1, 0, 1, 0, . . .)and the resulting code sequence

c = (11, 10, 10, 00, 01, . . .) .



Viterbi Decoding Algorithm

In its simplest and most common form, the Viterbi algorithm is amaximum likelihood (ML) decoding algorithm for convolutionalcodes. Recall that a ML decoder outputs the estimate c = ci iff iis the index (or one of them selected at random if there areseveral) which maximizes the expression p

Y|X(v|ci ) , over all

codewords c0, c1, c2, . . . . The conditional pmf pY|X

defines the

channel model with input X and output Y which is used, and v isthe received (and possibly corrupted) codeword at the output ofthe channel. For the important special case of memorylesschannels used without feedback, the computation of p

Y|Xcan be

considerably simplified and brought into a form where metricsalong the branches of a trellis can be added up and then a MLdecision can be obtained by comparing these sums. In a nutshell,this is what the Viterbi algorithm does.



Definition: A channel with input X and output Y is said to bememoryless if

p(yj |xj , xj−1, . . . , x0, yj−1, . . . , y0) = pY |X

(yj |xj) .

Definition: A channel with input X and output Y is used withoutfeedback if

p(xj |xj−1, . . . , x0, yj−1, . . . , y0) = p(xj |xj−1, . . . , x0) .

Theorem: For a memoryless channel used without feedback

pY|X

(y|x) =N−1∏j=0

pY |X

(yj |xj) ,

where N is the length of the channel input and output vectors Xand Y.

Proof: Left as an exercise.Peter Mathys ECEN 5682 Theory and Practice of Error Control Codes


Definition: The ML decoding rule at the output Y of a discretememoryless channel (DMC) with input X , used without feedbackis: Output code sequence estimate c = ci iff i maximizes thelikelihood function

pY|X

(v|ci ) =N−1∏j=0

pY |X

(vj |cij) ,

over all code sequences ci = (ci0, ci1, ci2, . . .) for i = 0, 1, 2, . . . .The pmf p

Y |Xis given by specifying the transition probabilities of

the DMC and vj are the received symbols at the output of thechannel. For block codes N is the blocklength of the code. Forconvolutional codes we set N = n (L + m), where L is the numberof data frames that are encoded and m is the maximal memoryorder of the encoder.



Definition: The log likelihood function of a received sequence v atthe channel output with respect to code sequence ci is theexpression

log[pY|X

(v|ci )]

=N−1∑j=0

log[pY |X

(vj |cij)],

where the logarithm can be taken to any basis.



Definition: The path metric µ(v|ci ) for a received sequence v givena code sequence ci is computed as

µ(v|ci ) =N−1∑j=0

µ(vj |cij) ,

where the symbol metrics µ(vj |cij) are defined as

µ(vj |cij) = α(log[p

Y |X(vj |cij)] + f (vj)

).

Here α is any positive number and f (vj) is a completely arbitraryreal-valued function defined over the channel output alphabet B.Usually, one selects for every y ∈ B

f (y) = − log[minx∈A

pY |X

(y |x)],

where A is the channel input alphabet. In this way the smallestsymbol metric will always be 0. The quantity α is then adjusted sothat all nonzero metrics are (approximated by) small positiveintegers.



Example: A memoryless BSC with transition probability ε < 0.5 ischaracterized by

pY |X

(v |c) v = 0 v = 1

c = 0 1− ε εc = 1 ε 1− ε

minc pY |X

(v |c) ε ε

Thus, setting f (v) = − log[minc

pY |X

(v |c)], yields

f (0) = f (1) = − log ε .



With this, the bit metrics become

µ(v |c) v = 0 v = 1

c = 0 α(log(1−ε)− log ε) 0c = 1 0 α(log(1−ε)− log ε)

Now choose α as

α =1

log(1− ε)− log ε,

so that the following simple bit metrics for the BSC with ε < 0.5are obtained

µ(v |c) v = 0 v = 1

c = 0 1 0c = 1 0 1



Definition: The partial path metric µ(t)(v|ci ) at time t,t = 1, 2, . . ., for a path, a received sequence v, and given a codesequence ci , is computed as

µ(t)(v|ci ) =t−1∑`=0

µ(v(`)|c(`)i ) =

tn−1∑j=0

µ(vj |cij) ,

where the branch metrics µ(v(`)|c(`)i ) of the `-th branch,

` = 0, 1, 2, . . ., for v and a given ci are defined as

µ(v(`)|c(`)i ) =

(`+1)n−1∑j=`n

µ(vj |cij) .



The Viterbi algorithm makes use of the trellis diagram tocompute the partial path metrics µ(t)(v|ci ) at times t = 1, 2, . . . ,Nfor a received v, given all code sequences ci that are candidates fora ML decision, in the following well defined and organized manner.

(1) Every node in the trellis is assigned a number that is equal tothe partial path metric of the path that leads to this node.By definition, the trellis starts in state 0 at t = 0 andµ(0)(v|ci ) = 0.

(2) For every transition from time t to time t + 1, all q(M+k)

(there are qM states and qk different input frames at every

time t) t-th branch metrics µ(v(t)|c(t)i ) for v given all t-th

codeframes are computed.



(3) The partial path metric µ(t+1)(v|ci ) is updated by adding thet-th branch metrics to the previous partial path metricsµ(t)(v|ci ) and keeping only the maximum value of the partialpath metric for each node in the trellis at time t + 1. Thepartial path that yields the maximum value at each node iscalled the survivor, and all other partial paths leading into thesame node are eliminated from further consideration as a MLdecision candidate. Ties are broken by flipping a coin.

(4) If t + 1 = N (= n(L + m) where L is the number of dataframes that are encoded and m is the maximal memory orderof the encoder), then there is only one survivor withmaximum path metric µ(v|ci ) = µ(N)(v|ci ) and thus c = ci

is announced and the decoding algorithm stops. Otherwise,set t ← t + 1 and return to step 2.



Theorem: The path with maximum path metric µ(v|ci ) selected bythe Viterbi decoder is the maximum likelihood path.

Proof: Suppose ci is the ML path, but the decoder outputsc = cj . This implies that at some time t the partial path metricssatisfy µ(t)(v|cj) ≥ µ(t)(v|ci ) and ci is not a survivor. Appendingthe remaining path that corresponds to ci to the survivor cj attime t thus results in a larger path metric than the one for the MLpath cj . But this is a contradiction for the assumption that cj isthe ML path. QED

Example: Encoder #1 (binary R = 1/2, K = 3 encoder withG(D) = [1 + D2 1 + D + D2]) was used to generate and transmita codeword over a BSC with transition probability ε < 0.5. Thefollowing seqence was received:

v = (10, 10, 00, 10, 10, 11, 01, 00, . . .) .

To find the most likely codeword c that corresponds to this v, usethe Viterbi algorithm with the trellis diagram shown in Figure 12.



• • • • • • • • •

• •

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

S0

S1

S2

S3

· · ·

v : 10 10 00 10 10 11 01 00

0 1

1

2

2

1

3

4

3

4

4

5.

5.

6

5

7

7

7

7

9

9

8.

8.

10

10

11

10

12

13

11.

11.

XX

X

X

X

X

X

X

X

X

X

X

X

X

XX

X

X

XX

XX

X

X

Fig.12 Viterbi Decoder: R = 1/2, K = 3 Encoder, Transmission over BSC

00 00 00 00 00 00 00 00

11 11 11 11 11 11 11 11

01 01 01 01 01 01 0110 10 10 10 10 10 10

11 11 11 11 11 1100 00 00 00 00 00

10 10 10 10 10 10

01 01 01 01 01 01

11

10

01

10

11

11

01

00

At time zero start in state S0 with a partial path metric µ(0)(v|ci ) = 0. Using

the bit metrics for the BSC with ε < 0.5 given earlier, the branch metrics for

each of the first two brances are 1. Thus, the partial path metrics at time

t = 1 are µ(1)(10|00) = 1 and µ(1)(10|11) = 1.



Continuing to add the branch metrics µ(v(1)|c(1)i ), the partial path metrics

µ(2)((10, 10)|(00, 00)) = 2, µ(2)((10, 10)|(00, 11)) = 2,µ(2)((10, 10)|(11, 01)) = 1, and µ(2)((10, 10)|(11, 10)) = 3 are obtained at timet = 2. At time t = 3 things become more interesting. Now two branches enterinto each state and only the one that results in the larger partial path metric iskept and the other one is eliminated (indicated with an “X”). Thus, forinstance, since 2 + 2 = 4 > 1 + 0 = 1, µ(3)((10, 10, 00)|(00, 00, 00)) = 4whereas the alternative path entering S0 at t = 3 would only result inµ(3)((10, 10, 00)|(11, 01, 11)) = 1. Similarly, for the two paths entering S1 att = 3 one finds either µ(3)((10, 10, 00)|(00, 00, 11)) = 2 orµ(3)((10, 10, 00)|(11, 01, 00)) = 3 and therefore the latter path andcorresponding partial path metric survive. If there is a tie, e.g., as in the caseof the two paths entering S0 at time t = 4, then one of the two paths isselected as survivor at random. In Figure 12 ties are marked with a dotfollowing the value of the partial path metric. Using the partial path metrics attime t = 8, the ML decision at this time is to choose the codewordcorresponding to the path with metric 13 (highlighted in Figure 12), i.e.,

c = (11, 10, 01, 10, 11, 11, 01, 00, . . .) =⇒ u = (1, 1, 1, 0, 0, 1, 0, 1, . . .) .



Definition: A channel whose output alphabet is the same as theinput alphabet is said to make hard decisions, whereas a channelthat uses a larger alphabet at the output than at the input is saidto make soft decisions.

Note: In general, a channel which gives more differentiated outputinformation is preferred (and has more capacity) than one whichhas the same number of output symbols as there are inputsymbols, as for example the BSC.

Definition: A decoder that operates on hard decision channeloutputs is called a hard decision decoder, and a decoder thatoperates on soft decision channel outputs is called a soft decisiondecoder.



Example: Use again encoder #1, but this time with a softdecision channel model with 2 inputs and 5 outputs as shown inthe following figure.

• •

•

•

•

••

0 0

@

∆

!

11

InputX

OutputY

Fig.13 Discrete Memoryless Channel (DMC) with 2 Inputs and 5 Outputs



The symbols “@” and “!” at the channel output represent “bad”0’s and “bad” 1’s, respectively, whereas “∆” is called an erasure(i.e., it is uncertain whether ∆ is closer to a 0 or a 1, whereas abad 0, for example, is closer to 0 than to 1). The transitionprobabilities for this channel are

pY |X

(v |c) v = 0 v = @ v = ∆ v =! v = 1

c = 0 0.5 0.2 0.14 0.1 0.06c = 1 0.06 0.1 0.14 0.2 0.5

After taking (base 2) logarithms

log2[pY |X(v |c)] v = 0 v = @ v = ∆ v =! v = 1

c = 0 −1.00 −2.32 −2.84 −3.32 −4.06c = 1 −4.06 −3.32 −2.84 −2.32 −1.00

− log2[minc pY |X

(v |c)] 4.06 3.32 2.84 3.32 4.06



Using

µ(v |c) = α(log2[pY |X

(v |c)]− log2[minc

pY |X

(v |c)])

with α = 1 and rounding to the nearest integer yields the bitmetrics

µ(v |c) v = 0 v = @ v = ∆ v =! v = 1

c = 0 3 1 0 0 0c = 1 0 0 0 1 3

The received sequence

v = (11, !@,@0,∆0, 1!, 00,∆0, 10, . . .) ,

can now be decoded using the Viterbi algorithm as shown in Figure14 on the next page.



• • • • • • • • •

• •

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

S0

S1

S2

S3

· · ·

v : 11 !@ @0 ∆0 1! 00 ∆0 10

0 0

6

1

1

6

8

6

10

11

9

11

14

12

13

16

15

16

17

22

22

20

20

25

23

23

25

28

28

31

29

X

X

X

X

X

X

X

X

X

X

X

X

XX

X

X

XX

X

X

X

X

X

X

Fig.14 Viterbi Decoder: R = 1/2, K = 3 Encoder, 2-Input, 5-Output DMC

00 00 00 00 00 00 00 00

11 11 11 11 11 11 11 11

01 01 01 01 01 01 0110 10 10 10 10 10 10

11 11 11 11 11 1100 00 00 00 00 00

10 10 10 10 10 10

01 01 01 01 01 01

11

01

00

10

10

00

10

10

Clearly, the Viterbi algorithm can be used either for hard or soft decisiondecoding by using appropriate bit metrics. In this example the ML decision (upto t = 8) is

c = (11, 01, 00, 10, 10, 00, 10, 10, . . .) ,

corresponding to u = (1, 0, 1, 1, 0, 1, 1, 0, . . .).


ECEN 5682 Theory and Practice of Error Control Codes

Documents