Fast parallel algorithms for decoding Reed-Solomon codes based on remainder polynomials

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 4, JULY 1995 873

Fast Parallel Algorithms for Decoding Reed-Solomon Codes

Based on Remainder Polynomials Dariush Dabiri and Ian F. Blake, Fellow, ZEEE

Abstruct- The problem of decoding cyclic error correcting codes is one of solving a constrained polynomial congruence, often achieved using the BerlekampMassey or the extended Euclidean algorithm on a key equation involving the syndrome polynomial. A module-theoretic approach to the solution of polynomial congruences is developed here using the notion of exact sequences. This technique is applied to the Welch-Berlekamp key equation for decoding ReedSolomon codes for which the computation of syndromes is not required. It leads directly to new and efficient parallel decoding algorithms that can be realized with a systolic array. The architectural issues for one of these parallel decoding algorithms are examined in some detail.

Index Tenns-ReedSolomon codes, decoding algorithms, systolic arrays, Welch-Berlekamp equations, modules.

I. INTRODUCTION

EED-SOLOMON (RS) codes have found increasing R application in such areas as telecommunications, data storage and transmission, video systems, etc. They have been adopted for several international standards and continue to be attractive for new applications. The development of efficient decoding algorithms for these codes is thus of consider- able importance, particularly for special-purpose and high- speed applications where conventional decoding algorithms and commercially available chip sets give limited performance. This is the subject of the present work.

LetC be acyclic [n ,k = n - ~ , d = ~+1] , RS code overthe finite field F, of length n = q - 1, dimension k, redundancy T , and minimum distance d = T + 1. For a a fixed primitive element of F,, label the ith coordinate position of the code by y. - &l - and for c = (co, c l , . . . , cn-l) E C denote

cz-1 = .[ai-'] = c[y& i = 1,. . . ,n.

The codeword polynomial associated with c E C is

n-I

Manuscript received September 15, 1994; revised December 10, 1994. This work was supported by the Natural Sciences and Engineering Research Council of Canada under Grant A 7382. The material in this paper was presented in part at the International Symposium on Information Theory, Trondheim, Norway, 1994.

The authors are with the Department of Electrical and Computer Engineer- ing, University of Waterloo, Waterloo, Ont., Canada N2L 3G1.

IEEE Log Number 941 1962.

Let the generator polynomial of the code be

T

dz) = U(. - Yi) i=l

and assume the code is systematically encoded with check positions yi, i = 1,. . . , T .

The usual approach to decoding RS codes requires the computation of syndiomes

= ~ ( 2 ) = ~(2) + e(ai) , i = 0,1,. . + , T - 3

where ~ ( z ) is the transmitted codeword, e(.) an error sequence over F,, and R ( z ) the received word. The syndrome polynomial is

T - 1

S(2) = p i 2 0

and the usual form of key equation to be solved for decoding is

a ( z ) S ( z ) f w(z)modzZt dega 5 t , degw < dega

where the error locations are related to the zeros of a ( z ) and the error values are related to the values of w ( z ) at the error locations.

A new type of decoding algorithm has recently been pro- posed which involves the solution of a set of equations, the Welch-Berlekamp (WB) equations, whose formulation does not require the computation of syndromes [l], [2]. Implementations of this algorithm have proved very efficient [2]. This paper develops a different technique for the solution of the WB equations that leads to an efficient parallel decoding algorithm for Reed-Solomon codes.

Since very little has appeared in the literature on the WB equations and their solution, a brief derivation of them is given in the next section. This material is based on the work of Berlekamp [l]. Section I11 derives a novel approach to the solution of polynomial congruences based on the notion of exact sequences of modules. Although theoretical in nature, this approach leads directly to the development of new, novel and efficient parallel decoding algorithms as discussed in Section IV. Architectural considerations for the implementation of one of these algorithms are developed in Section V.

0018-9448/95$04.00 0 1995 IEEE

874 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 4, JULY 1995

11. THE WELCH-BERLEKAMP KEY EQUATIONS As little has appeared in the literature on the WB equations,

a brief derivation of them based on the work of Berlekamp

Applying the Lagrange interpolation formula for the interpolation points in {ul 7 u2 ‘ ’ . we Obtain

(3) F ( z ) u:N(u,) z - uz F’(u,)

z J N ( z ) = E-- -. [ l ] will be given here. Suppose that the codeword C ( Z ) is transmitted, and the word R ( z ) received, with e errors at z=1 1ocations 7 1 > 7 2 7 . . . , Ye. Denote the error locator PlYnomial Note that the degree of the left-hand side polynomial is less

than m - 1 and therefore the coefficient of the monomial of degree m - 1 on the right-hand side must be zero. Thus

e

H ( Z - 7%) i=l

by W ( z ) . At the first stage of a remainder decoder, the received word R ( z ) is re-encoded by taking the message locations, assumed here to be the rightmost k positions of the word R ( z ) , and re-encoding their contents. This yields the remainder polynomial ~ ( z ) = (R(z ) modg(z)) of degree at most T - 1, where there should be no confusion between two uses of T . Obviously, ~ ( z ) and R(z) belong to the same coset, which also contains the error pattern ~ ( 2 ) . The polynomial

7.-1

j = O

uniquely specifies the coset of the received word but is quite different from the usual power sum syndrome. The coefficients ~ i - 1 are also denoted by ~ [ g i ] ; to stress the correspondence between ri-1 and the check locations y i , i = 1,. . . , T . We recall a simple fact about the null space of the Vandermonde matrices.

Lemma 1: Let V be an T x mvandermonde matrix

Since the dimension of the null space of V is m - T and the dimension of the vector space of all polynomials of degree less than m - T over IK is m - T as well, each vector in the null space of V can be uniquely identified by a polynomial of

The following is a simple derivation of the Welch- Berlekamp (WB) equations from the conventional key equation. Given the set of syndromes {SO, S I , SZ, . . . , Sr-1}, the coefficients of the minimum shift-register realization for this sequence, that is { uo, q , 02, . . . , U,} are required, where

degree less than m - T .

e

i = O

Rewrite the above equation in terms of the coefficients of the error-locator polynomial W ( z ) (recall ‘that Wi = ue-i)

e

WiSk-e+i = 0, IC = e, e + 1, . . . , T - 1 i = O

and change the variable from k to k - e, then e

i = O

It follows that where m > T and {ui I i = 1, . . , m} are all arbitrary distinct elements of a field. For any z in the null space of V , that is,

than m - T , such that

r-1

V x = 0, there exists a unique polynomial N ( z ) of degree less

f o r i = l , . . . , m (1)

where j = O i = O

= 0, = 0,1,. . . , T - e - 1.

Rearranging, N(u i ) - F’(u;)’ x . - -

m

F ( z ) = n(2 - U i ) .

i=l

Define

fj = T j W ( ( Y j ) .

Proof: Assume that the polynomial N ( z ) is of degree less than m - T . To prove that z, as defined in (l), is in the null space of V it must be shown that

Consequently r-1

f j d k = 0, IC = 0,1, . . . , T - e - 1. m i=o

. I -

C u i x i = O , f o r j = ~ . . . r - 1. (2) By Lemma (1) there exists a polynomial N ( z ) of degree less than e, such that fj = N(aj)/g’(aj). Therefore

i=l

j = 0,. . . ,T - 1. N(.j)

T j W ( C E J ) = - g’(aj) ’

Since deg (N) < m - T , then

z j N ( z ) = ( z j N ( z ) mod F ( z ) ) , for j = 0 , . . . , T - 1.

DABIRI AND BLAKE: ALGORITHMS FOR DECODING REED-SOLOMON CODES BASED ON REMAINDER POLYNOMIALS 875

Using the notation ~ [ y ; ] = T--~, the RS decoding can be formulated as a problem of solving the set of the WB equations

Lemma 2: EM is a free K[z]-module of rank two, and a basis is given by

where deg (N) < t, deg (W) 5 t. (6)

gcd (G(z), D ( z ) ) =

gcd (X(z), d z ) ) = 1 Moreover, it can be shown that the error values can be found as follows [l]. If y j is a message location, then

S(z) (3) + r (z ) ( Z ) = 1. (7) N(Yj 1

W'(Yj)9(Yj). Proof: First, note that the two bases 4 Y j I =

If yj is a check location, then

"(yj) + r[yi]. (8) p e elements of EM. Consider the matrix

A = [ '$!; ~] w'(Yi)g'(Yj 1 4 Y j l =

111. MODULES OF SOLUTIONS X(2)

Our goal is to find the unique pair of polynomials (W, N) which satisfies the WB equations (5) under the constraint (6). To this end, a characterization is given of the set of all solutions of (5 ) , without the constraint (6), as elements of a module of rank two. Then any solution of (5) can be expressed as a K[z]-linear combination of two basis solutions, including the solution (W, N) sought, which satisfies the condition on the degrees (6). Equivalently, one can put (5) in the form of a congruence [3]

N ( z ) = W(z)f(z) modg(z) (9)

by taking f(z) to be any interpolating polynomial, such that

f(yi) = ~[y;] g'(yi), for i = 1,. . . , T.

It can be shown that the set of all pairs (w(z), n(z)), for a given pair of polynomials f ( z ) and g(z) satisfying congruence (9), forms a finitely generated K[z]-module of rank two, IK = IF, [4], and that the set of all solutions can be expressed as a (z) ( l , f (z ) ) + b(z)(O,g(z)), for some polynomials U(.)

and b(z). Consider for the moment the slightly more general K[z]-

module EM defined by the set of all pairs (w(z),n(z)) satisfying

(10)

where D(z) and G(z) are known polynomials in IK[z]. The relationship between the solution of (10) and those of (9) is immediate by choosing G(z) = 1 and D(z) = f(z). Note that without loss of generality it can be assumed that X = gcd(G,D), and g are relatively prime.

Some useful properties of this solutions module EM are developed in this section. While the approach is somewhat ab- stract, it will be shown in the next section how it leads directly to interesting, efficient and unique decoding algorithms.

Clearly 2R is also a finitely generated module of rank two, and the following lemma provides an explicit expression for a basis.

D(z)w(z) + G(z)n(z) = 0 mod g(z)

and let (w(z), n(z)) be a solution, i.e., an element of EM. Since det A = 1, there exists a pair of polynomials (w*(z), n * ( z ) ) such that

(+),n(z)) = (w*(z), .*(.))d. As D(z)w(z) + G(z)n(z) = 0 mod g(z), we have

= O mod g(z) .

Since gcd(X(z),g(z)) = 1, then g(z)lw*(z), that is, there exists a polynomial w**(z) such that w*(z) = g(z)w**(z). Therefore

It will be convenient to deal with the basis as a matrix. Dejnition I: Let {(Qi,i(z), 91,2(z)), (*2,1(2), *2,2(2)))

be a basis for EM and define a. basis matrix 9 of M as follows:

1. 9 = [ *1,1(z) 9 1 , d Z )

*2,1(z) *2,2(z)

The following lemma is useful when one wants to find a

Lemma 3: i) For any basis matrix 9 of 9Jl we have:

basis for the module EM.

det (Q) = cyg(z)

where cy is a nonzero element of K. are in EM and

det@ = ag(z), for some nonzero a E K, then @ is a basis matrix.

ii) Conversely, if the rows of @ E


Proof: Note that statement i) holds for Q

Now, let 9' be any other basis matrix. It is clear there exist matrices T E and T' E K[z]2x2 such that

9' = T9 9 = T'W.

Thus T' = T-l , i.e., T is invertible, then detT is a unit in K[z], i.e., detT E IK, implying that det9' = ag(z), for some nonzero a E IK. To prove statement ii), note that there exists a matrix T E K[zlZx2 such that

= TQ.

Then d e t a = detTdet!P, therefore, detT = CY. Hence T is invertible and

Q = T-'@.

Then clearly is a basis matrix. H Returning to the specific case of the Welch-Berlekamp

equations, the module of solutions of (5) can now be viewed as the intersection of the modules of solutions of each of the equations (5 ) , where each one represents the kernel of a homomorphism. More specifically, the set of solutions to the equation

4 Y i ) = W(Yi)T[YiISI(Yi)

q5i : K[z]2 -+ K

represents the kernel of the homomorphism

4 % ( 4 Z ) , 4.)) = 4 Y Z ) - w(Y%)dY%lg'(Y%).

Then any pair of polynomials in the kernel of the homomorphism 4% yields a solution to the WB equation at y2. As will be discussed in the next section, a basis of the module defined as the kernel of 4% is required. Since the module of solutions of (5) is the intersection of these kernels, a technique to find the basis of the intersection of Ker (4%) is introduced here. This motivates the introduction of the following facts from ring theory, which proves to be an effective language for describing the algorithms. Let R be a ring, and !?I, 9, and ?Yl be R-modules. For module homomorphisms f and g, the sequence

nfq3%?Yl

is defined to be exact if Im (f) = Ker (9). Intuitively, one can regard f as a characterization for Ker(g). For example, let m,, a module of rank two, be the kernel of dZ, with a 2 x 2 basis matrix 9 % ( z ) , then the following is an exact sequence

IK[z]2 4 IK[# s IK, where $%((w(z), V(Z)) = (w(z), ~ ( z ) ) Q ~ ( z ) . The following lemma, which provides us with a simple way to characterize the intersection of two kernels, is in fact the cornerstone of our algorithms.

Lemma 4: Let the sequence

n%q33?Yl

be exact, and let 4 2 : q3 + phism. Then

be another module homomor-

Proof: Obviously

and

therefore

Note that

therefore

and

It follows from (12) and (13) that

The following Corollary is a direct extension of this result. CoroZZary 1: Let the sequence

be exact, and let 4i : q3 + R', i = 2 , . . . ~ , be module homomorphisms. Then

Ker (41) r l Ker (&) n . . . Ker (+,.) = $(Ker ( 4 2 o $) n Ker ($J~ o $) n . . . n Ker (4,. o $)).

Our interest in these homomorphisms will be in the fact that with their appropriate definition, an element in one of the kernels corresponds to one of the WJ3 equations being satisfied. An element in the intersection of their kernels corresponds to all of the equations being satisfied. It remains to construct the homomorphisms in such a manner that the degree constraints will also be satisfied.

DABIRI AND BLAKE: ALGORITHMS FOR DECODING REED-SOLOMON CODES BASED ON REMAINDER POLYNOMIALS

-

877

IV. DECODING ALGORITHMS

Using the results of the previous section, a general approach for the solution of the WB equations (5) will be developed. From this approach, two different decoding algorithms will be derived. Since our study of RS decoding algorithms is mainly motivated by architectural issues, the asymptotic complexities of the algorithms are not the target for any improvement. The emphasis instead is on the parallelism inherent in the algorithms, and the regularity and simplicity of the structures that process the input data.

In our approach, the set of solutions for congruences (5) are viewed as an intersection of a number of modules, each corresponding to one or two equations of the WB equations. Lemma 4 is used recursively to obtain a representation of the module of solutions of all of the WB equations (5). An obvious expression for the module of solutions of congruences ( 5 ) is as follows:

where

Thus if ( w ( z ) , n ( z ) ) E Ker(4i) then ( w ( z ) , n ( z ) ) satisfies the WB equations at z = yi, and a solution pair in M satisfies all the equations. It remains to construct the composed homomorphisms so the degree constraints are satisfied.

The first algorithm takes advantage of the above expansion to find the solution of the WB algorithm. In this way, the algorithm at each iteration solves one equation from the WB equations, and subsequent iterations preserve that solution while solving new equations. The computational advantage of this decomposition resides in the fact that for each homomorphism d i , the construction of a suitable exact sequence is immediate. This property results in parallel-pipeline implementations, which will be discussed in the next section.

A slight (but important) generalization of this approach results by taking a suitable pair of homomorphisms q5i2), to form a new homomorphism from K[zI2 to K2. Again, a suitable basis for the kemel of this homomorphism can be found immediately. In this method the following expression for M is used:

M = Ker (&) n Ker (q2) n . . . n Ker (&)

where -

4i = (4i*,4i2), = 1 , . . . , t .

For this algorithm, each iteration solves two equations of the WB equations and preserves previous solutions at each new iteration. Both algorithms have parallel-pipeline implementations on linear systolic arrays, which distinguishes them from the WB algorithm.

A. A Single-Step Algorithm

Consider the solution of the polynomial congruence (10) for the particular case of the WB equations, i.e., for given Di and

Gi, consider the following sequence:

where

i = 1 , 2 , . . - , 2 t (16)

if Di # 0, and

i = 1 , 2 , . * . , 2 t (17)

if Di = 0 and Gi # 0. Recall that for the WB equations (5), the constants of interest will be D; = -r[yi]g’(y;) and Gi = 1, where r[y;] = ~ i - 1 is the coefficient of 2-l in the remainder polynomial.

Further notice that if ( w ( z ) , n ( z ) ) are such that Diw(yi) + Gin(yi) = 0 for all the zeros of g(z ) then, for suitable interpolating polynomials,

D ( z ) w ( z ) + G(z)n(z) Omodg(z).

Note that in the above the trivial case Di = Gi = 0 has been excluded, since this corresponds to the ith equation being already satisfied. The following lemma proves that the sequence (14) is exact. Lemma 5: The sequence

lK[# 5 K[z]2 5 K

is exact.

rows of Proof: Consider the case Di # 0. Obviously, the two

are two elements of the Ker(q5i), and

Therefore, by Lemma 3

is a basis matrix for the module Ker (4i). From the definition of +i the same matrix is a basis for Im(+i), and it follows that the sequence (14) is exact. The case D; = 0 and Gi # 0 foIlows in the same manner.

Lemmas 4 and 5 lead directly to an algorithm for finding a basis of M which includes the solution of the WB equations, i.e., the unique solution satisfying the degree constraints.

To develop the algorithm, a homomorphism is chosen from the set of homomorphisms {&, . . . , 42t} , say &, such that Djl # 0, where the second subscript on these objects will


represent the iteration number. By Lemma 5, the following sequence is exact:

Define

where for the composition homomorphism the superscript represents the iteration number. Note that q5jl o $ j , = 0. By Lemma 4

M = $jl(Ker(q5i) nKer(&) n .. .nKer(&)) .

The problem is thus reduced to finding a basis for

Ker(&) nKer(&) n . . . n K e r ( & , ) . (19)

It can be easily shown that

[Z] = [:I which means that Ker(&) = IK[zI2, and that subsequent iterations will preserve this condition. Therefore, the number of nontrivial modules in (19) has been reduced by one. One can interpret this operation as solving the equation corresponding to index j l . At the next step, a single homomorphism, say q5i2, such that Di2 # 0, from the set of the homomorphisms {b;, $:, . . . , &,} is chosen. The above procedure is repeated, in order to obtain a new set of homomorphisms {#1i = 1,2,. f . ,2t}. Note that at this stage

implying that Ker (& ) = Ker (&) = K[zI2, and the equations corresponding to indices jl and j 2 are satisfied. This procedure can be repeated up to the point where

Df = 0 , i = 1,2,...,2t (20)

for some integer I 5 2t. Consider now the following set of homomorphisms:

{4L 4 L . . . , and define

Mz = Ker (4:) n Ker (&) n . . . n Ker (&).

By Lemma 4

M=$. 31 04% O . . . 0 4 j i ( ~ l >

where M and Ml are submodules of IK[zI2. Notice that the composition mapping is viewed as a mapping of IK[zI2 to

itself, and both the modules m,Ml are viewed as submodules of IK[zI2. At this point, for any pair of polynomials (w(z ) ,n ( z>) E M

& w ( z ) , n ( z ) ) = 0, i = 1,2, * . . ,2t.

It remains to show that, from the manner in which the homomorphisms are defined, the degrees of the polynomials 'generated by these homomorphisms satisfy the constraints, and thus will yield the solution to the WB equations.

Note that Df = 0,i = 1,.-.,2t. Thus (1,O) E M', and the image of (1,O) under $ ef $jl o $ j2 o . . . o $ j , is in m. Define the 2 x 2 matrix Q ( z ) E IK2[z], corresponding to homomorphism $, as follows:

where, for this equation, the pair of polynomials represented by q(1,O) is the first row vector of e ( z ) etc. The following lemma shows the polynomials in the first row of this matrix yield the unique solution to the WB equations. It is noted that it is only necessary for the Df , to be zero for this to be true, a fact that will be implicit in the algorithms to follow.

Lemma 6: Let ( 9 1 ~ ( 2 ) , 9 ~ , ~ ( 2 ) ) be the image of (1,O) under the mapping $, i.e.

(*1,1(z), Q1,2(2)) = $(I, 0) = $jl 0 $ j 2 0 . * O $5 (170)

Proofi The fact that ( Q ~ J , is a solution for the WB equations has already been shown. It is only required to show the condition on the degrees are satisfied. First an explicit formula for $ is given

. . . [(-G;il) 211. Define

and

The following conditions are established by induction and follow an argument of Berlekamp [l] in the WB algorithm:

1) d%(*lf,l) I l;]? 2) deg(*f,,) I [Y] , 3) deg(@'"2,1) I [$I 4) deg(Qb,,) 5 [VI.


For i = 1 all of the above statements are trivial. Assuming the statements are true for i. then

deg (Qiy) 5 [(i - 1)/2] + 1 = [(i + 1)/2].

The other statements follow similarly. Thus in particular since 1 5 2 t , then

and

It follows that Ql,l(z) of the previous Lemma is the error locator polynomial and Q1,2(z) is related to the error locator polynomial. This algorithm will be referred to as the single- step algorithm for the given Di , G; given by the WB equations, and can be summarized as follows, where it is recalled once more that D; = -r[yi]g’(yi), and Gi = 1.

Algorithm I : 1) Initialization: Q y = 1; qy,2 = 0; @s,l = 0; @;,2 =

2) Choose j , such that Dj”s-l # 0, if there is no such j? ,

3) Do in parallel for k = 1 to 2t

1, D: = D;, G$‘ = Gi; s = 1;

stop;

4)

5 ) s = s + 1, go to step 2;

Note that at each step of the algorithm, subscripts on D;, # 0 represent WB equations yet to be solved. At step 2) of the algorithm it suffices to choose any DS. # 0, representing the next equation to be solved. Recall that the algorithm stops when s = 1 + I, for some integer 1 less than 2 t . It is observed that

0

and that

Techniques for reducing the arithmetic complexity of AZ- gorithm I are now considered. This algorithm uses a four- polynomial recursion, as does the original WB algorithm. Using the following lemma it will be shown how this can be reduced to a two-polynomial recursion, an important sim- plification.

Lemma 7: Define

~

879

If yi is an error location, then

Pro08 Two cases are considered depending on whether the error is at a check or message location. Suppose first the error is at a message location y;. By (21) and (22) it is observed that

s=1 (24)

Notice that by construction the yjs are check locations. Since y; is a message error location therefore Ql,l(yi) = 0, and

s=l

and so

Recall from the definition of the error locator and evaluator polynomials that the error value is given by

The lemma follows after substituting Q1,2(yi) from (25) in the above equation.

Suppose now the error is at a check location yi. Recall that

and therefore Ker (q5fs) = K[z]’ for s = 1, . . . , 1. Thus

Ker($j,) nKer(&) n - . . n K e r ( & ) = $(K[z]~)

It follows that

$(O, 1) = (Q2,1(Z)I Q2,2(z)) E K e r ( h ) nKer (& ) n . . . Ker

or equivalently

since the polynomials of $(O, 1) are solutions to the original equations. By (24) we have


However, by the previous equation for k = 1, if @l,l(yjs) = 0 then !P1,2(yjs) = 0, and for k = 2, if \k2~(yj , ) = 0 then Q2,2(yj,) = 0. It follows that gcd(Ql,l,Q2,1) = 1, that is, @2,1(y;) # 0, since otherwise ( Z - Y ~ , ) ~ ~ det (Q), which is not possible. A similar argument was used by Berlekamp [l] for another development. Similarly, it is observed that

This follows since if y; is a check error location, then Q1,1(y;) = 0, and from the WB equations Q1,2(y;) = 0 and from the equation for the determinant ( z - y;) I det (Q(z ) ) . It is interesting to observe that all check error locations then occur among the positions {yj., s = 1, . . . , 1). It follows that h(yi) # 0.

Recall that the error value at location yi is given by

Note that (h(z)Q2,1(z), h(z)92,2(z)) is a solution of (5). For the check error location y;, ( 2 - yi)lgcd(91,1(z),g(z)), and

j , is neither in and Gip' = 1 to yield

nor in and k E C"-l,Di-l - - 0,

Thus for all k E E'-', G; = 0, and Dg = Dj"*-' # 0 and hence k is not in C". Similarly, without loss of generality, it is assumed that for all k E E'-', D; = 1. Therefore, at the ( s + 1)st step of Algorithm I, if C"-l # 0, then js+l can be chosen from Assume that js+l E For k E and k # js+l it follows that

and SO Dl+i = 0, and G",' # 0, k E CS+l and js+l E

it is convenient to adjust the iteration counter by one.

choose js+l E then

The above discussion leads to the following lemma, where

Lemma 8: In Algorithm I, if whenever # 0 we

1)

2) (CS-1 - {jS+1}) c CS+l. Proof: The first statement has already been proved. From

= 0,

Thus at step 3 in Algorithm 1 indices k E C"-l U C" U 0" can be excluded from consideration since, over the next two iterations, computation involving them are predictable. This leads to the following final version of the two index-set single- step algorithm.

Algorithm 2: 1) Initialization: Q:,' = 1; 2) If CSp2 # 0, choose j , from

3) Do in parallel for k = 1 to 2t, and k

(28) it follows that if k E C"-l and k # js+lr then and Gi+l = yk - yjs+l # 0. Thus k E - {js+l}. W

= 0; s= 1; CO = 0.

such that D:s-l # 0; if there is no such j,: Stop;

0s-1.

otherwise choose j ,

U C"-l U

Therefore, it is not required to compute the two polynomials e;,, and !P;,2. The next reduction in complexity is achieved by an extension of Berlekamp's idea of using queues in the WB algorithm. First, an interpretation of this idea from the perspective of this work is given. This is then extended from the idea of using one queue in the WB algorithm to a two index-set version of Algorithm 1. In essence the use of index-sets is a technique to avoid performing unnecessary computation.

At the sth step of Algorithm I, denote the set of all IC E { 1, . . . , 2 t } such that D; = G; = 0 by R", and denote the set of all indices k such that Dg = 0 and G; # 0 by E". Note that if IC E E", then without loss of generality it can always be assumed that Gf = 1. Choose j , such that D;=-' # 0 (hence

4)

5 ) s = s + 1, go to step 2. At the sth step of Algorithm 2, CSp2 is a "priority index-

set" in the sense that elements of # 0 can be selected as j , with predictable effect. Since the way the data are arranged in C" is not important, the data structure which can implement C" can be a stack or a queue or other suitable implementation. Also note that at each step of the algorithm, two distinct index-sets C"-l and C" are involved, and recall that - { j s } ) c E".


B. A Two-step Algorithm

In the algorithms of the previous section, in each step an exact sequence corresponding to one single homomorphism cpjs was constructed. At each iteration only one equation from the WB equations (5) was solved. In the following, an algorithm which constructs the exact sequence corresponding to two of the WB equations simultaneously, while retaining the parallel-pipeline implementation, is considered. Recall that the common idea behind these algorithms is to construct an exact sequence and to apply Lemma 4 to solve the set of equations (5) recursively. While this resulting two-step algorithm is generally similar to the one-step of the previous paragraphs, certain new notions are required.

Consider the two homomorphisms $il and $iZ given by

4% ( w ( z ) , 4.)) = DilW(Yi l + Giln(Yi1) (29)

and

and let & = ($i l , q5iz) be the Cartesian product of the two homomorphisms. Define another homomorphism by

Lemma 9: If

then the sequence

is exact. Pro08 Since the two rows of

1 -Gi,(z - Y i z ) a,(. - Yi, ) -Gi, (. - Y i l ) a, (. - YZI )

Ba =

are two elements of Ker(&) and detBi # 0, the lemma follows from Lemma 3.

Note that Ai = 0 means the two vectors (Dil,Gil) and (Diz , Gi,) are parallel, i.e., they are scalar multiples of each other. Let U ) I w denote U parallel w. Applying Lemma 4, select 8') = {zf),it)} such that

(Diil) Gi;l) ) M ( D i ~ ) , Gi;) 1. By Lemma 9 the sequence

is exact. Define

M1 = n Ker (4; ) a

where

~

881

for any index i. By Lemma 4

M = &l) (mz'). Specifically, it follows that

Note that

implying that the two equations corresponding to indices i f ) and have been solved.

At the next iteration select two homomorphisms +:(,), +ir) from the set {4:, &., . . . , d i t } , such that

1

where i(2) = {if), i?)}, and similarly construct the new set of homomorphisms { K , &, . . . , &,}. This procedure is repeated until the point is reached where there is no pair (if), it)) such that

i.e., all the remaining vectors [of, Gf] lie on the same line in K2. Similar to the single-step algorithms, define the homomorphism $ and the polynomial matrix T as

- - 1cl = $1 0 $1-1 0 . . . o $1

It is readily observed that the vectors [of, Gf] are all zero only if I = t, since otherwise Ker(q5') = K[zI2, which means that Q is a matrix basis for M, however g(z)ldetT only if I = t. Consider two cases, first when 1 < t, and second when I = t. If 1 < t, there exists a vector [of,, Gf,] # 0, for some index il such that

-

[of, Gf] = ai[Df,, Gf,].

Therefore, [-Gf,, Df,] E Ker (4 ' ) and

[ r ( z ) , A(.)] = [-Gfl, Df,]T E mz. By (32), the degrees of the polynomial entries of T are bounded by I, and therefore deg(I') < t and deg(A) < t, that is, I? and A are solutions of the WB equations.

is a basis matrix for and either row of T satisfying the degree condition is the unique solution of the WB equations; otherwise, it

In the case that 1 = t, as already noted,


must be that deg(T1,z) = deg(G2,2) = t. In the latter case, even though neither of the two rows of satisfies the degree condition, it is clear that by only one step of reduction one can obtain the solution. The two-step algorithm is summarized as:

Algorithm 3: 1) Initialization: @l,l = 1; @l,z = 0; @z,l = 0; @z,z = 1;

2) Choose a pair of indices i(") ef {zf), z$)) such that

4 4 4 4

DO Di; G! = Gi; s = 1;

if there is no such pair go to step 6; 3) Do in parallel for k = 1 to 2t :

4)

5 ) s = s + 1, go to step 2; 6) Define [y,X]:

a) if s 5 t , then [y, A] = [-Gis, Dt] # 0, for some index is,

b) if s = t + 1, then [y,A] = [-@z,2 ,@l,z 1. (Note that @z,2 and *l,z are the coefficients of zt in the polynomials @2,2 and @1,2 , respec- tively.)

-s-1,t -s-1,t

-s -1 , t -s-1,t

-s-1 -s-1

7) Compute

[r(z),Nz)I = [Y, 4

0 As with the single-step algorithms, a two polynomial version

of Algorithm 3 can be derived. One can also reduce the computational load of the algorithm by a systematic search over the indices in step 2 of Algorithm 3. The details of these modifications are omitted.

V. A LINEAR ARRAY ARCHITECTURE

In this section a linear systolic array implementation of Algorithm 2, consisting of 2 t cells, capable of solving the WB equations (5) is described. The basic idea is first explained. In the two algorithms described in the previous section, the set of homomorphisms { $jl, $j2, . . . , $ j , } that led to the solution of the WB equations were determined. For the single-step algorithm 1 5 2t , and for the two-step algorithm 1 5 t. At

the sth step of the algorithm, each of the homomorphisms 4; is transformed as follows:

4; = 4;-1 0 llj,. (33)

In essence, each cell of the systolic array is to be a realization of these transformations (33). For the single-step algorithm the registers in the sth cell must be initialized by (D;S', GqSp1, yj,), since $js is completely specified by these parameters. Similarly, as each homomorphism $-' is identified by the parameters (DJ-', G;-', yj), the parameters (D; , G;) of the homomorphism 4; are obtained by the arithmetic operations

Each cell is required to have enough computational resources to perform one operation of the kind specified in (34) and each of the transformations (33) is to be assigned to one cell in the systolic array at an appropriate time. The array is to operate in a pipeline fashion, that is when the the sth cell is performing the transformation = 4 y - ' ) o $ j s , the (s-1)st cell is performing the operation 4Y;" = 4?>') o $js-l. The transformations are assigned to the cells in such a manner that the pipeline operation of the array is preserved.

A simple transformation assignment scheme which works with the single-step algorithm is described. A similar scheme can be given for the two-step algorithm. The first inputs of the linear array are ( 0 1 , G 1 , yl), which identify 41. Assume that D1 # 0, and thus jl can be set to 1, and

*jl = [( -G1 .-Y1) ?I

The first cell of the systolic array is used to implement the transformation 4; = 4 j o $jl for j = 2 , . . . ,2t . As the second set of data (Dz , Gz, y2), associated with the homomorphism 4 2 , is passing through the first cell, the transformation 4; = 4 2 o $jl is computed by the first cell

If Di # 0 then choose j z = 2, and

is assigned to the second cell in the array. If D; = 0, that is, 2 E Cl, then even though j z cannot be

chosen as 2, since it is known that Dg # 0, j3 can be set to 2. Thus y2 can be pushed along in the array until it reaches the third cell, and set

*j3 = 0 ; ] .

The triple (D; , G;, y3) is received by the second cell, and again the condition Di # 0 is checked. If this condition is satisfied, j2 = 3 is chosen, and

@j, = -G; Y ] . [ ( z - Y3)


~

883

Otherwise, it is known that 3 E C1, and since j 3 = 2 has already been chosen, j 5 can be chosen to be 3. Hence, in this case, y3 has to be pushed along in the array until it reaches the fifth cell, and 9 j 5 is set as

@ j 5 = [( - Y 3 ) O ;] . From the above discussion, the following assignment rules

can be set to insure that the correct transformation (33) will be assigned to the correct cell.

1) Data pass through a cell which is not assigned to any

2) If Dl-' # 0, and the sth cell is not already assigned to transformation without any change.

a transformation, then set j , = IC, and

3) If neither the sth cell nor the (s + 1)st cell is assigned, but Di-' = 0, then set js+l = IC, and

The above assignment rules are not unique, and the reader can easily conceive of different organizations for computations of Algorithm 2.

Example: Consider the Reed-Solomon code with the parameters (n = 63, k = 55, T = 8) , and generator polynomial

7

n c z - CYz> a=O

over the finite field GF(64) with the primitive polynomial z6 + z + 1. Assume the codeword c(z) is transmitted and the error pattern ~ ( z ) = a53 + a5z34 + a6z4' is introduced by the channel. After re-encoding, the remainder polynomial is

T ( Z ) = a22z7 + a42z6 + Q29z5 + a17z4 + ff1OZ3 + d 5 Z 2

+ a55z + a55

then

D = [d7, a17, a14, a33, a23, a54, a15, a2']

G = [l, 1,1,1,1,1,1,1].

The first inputs of the array are (Dl = a17, G I = 1, y1 = a7) (see Table I). Note that the algorithms described in the last section are not sensitive to the order of the input data. Since D1 # 0, set j1 = 1 and assign the transformation

to the first cell of the array. At the second step, the inputs of the first cell of the array are (D2 = a17,G2 = l , y 2 = a6) , and the outputs are (Df = 0,Gi = a29,y2 = a6). Since Of = 0, j 2 cannot be set to 2. Therefore, the data (Di =

0,Gi = a29,y2 = a6) will pass through the second cell without change. Note that Gk can be set to 1. At the third step, the input of the second cell of the array is (Di = a46, Gi = a31, 9 3 = a'). Since Di # 0, j 2 can be chosen to be 3, and the following transformation is assigned to the second cell:

= [ ( y E L 5 ) no461 [3]

By the third assignment rule the following transformation is assigned to the third cell:

At the sixth step, after passing through the second and the third cells, (0," = do, Gj = a2, y4 = a4) is the input to the fourth cell. Since 043 # 0, set j 4 = 4, and the fourth cell is initialized to do the following transformation:

At the eighth step, the input to the fifth cell of the array is (Di = 0, G; = a42, y5 = a3). Since D; = 0, j , cannot be set to 5, and therefore the data (Dt = 0, G i = a42, y5 = a3) will be passed to the next cell in the array. In this way, the sixth cell will be initialized (at the tenth step) to do the operation

At the ninth step, the input to the fifth cell is (0," = 0, G," = a46, y6 = a2). Since 0," = 0, again j 5 cannot be set to 6 and therefore the data will be passed to the sixth cell. At the tenth step, the input to the fifth cell is (04 = 0, G; = a22, y7 = a). Since D$ = 0, j5 cannot be set to 7, and the data will be passed to the sixth cell. At the eleventh step, the input to the fifth cell is (0," = a47,Gt = a42,y8 = l ) , and the fifth cell is not yet assigned. Since Dt # 0, j 5 can be set to 8 and [$I = [ (Yi a42 - 1) a 4 7 1 0 [$I The time that the last datum is assigned to a cell can be considered as the stopping condition for this phase of the


................... Y

Fig. 1 . The organization of the cells in the array.

TABLE I Irjpm TO THE ARRAY AND THE OUTPUT OF TI~E CELLS

0 cell7 1 ..

am 1 ... 1

0 0 cell6 1- 1 ..

algorithm. Meanwhile, the input to the seventh cell is (Dg =

Finally, it is required to find the following product of 2 x 2 0,G: = l , y6 = a').

matrices defined over polynomials

This step of the algorithm can easily be implemented by a linear systolic array, as well. As it will be shown, the same array used in the previous phase of the algorithm can even be re-used. It follows from (36) that

1 [91J] = [ a4323 + 2 2 + a352 + a59 9 2 , J a3323 + a47z2 + a72 + a5

The error locator polynomial 9 1 , 1 ( x ) can be factored as

91,1(2) = a43(2 + l ) (Z +a43)(2 + a45). The error values can then be obtained from the error evaluation formula (23).

As it is shown in the above example, the algorithm consists of two phases. In the first phase, the matrices 9j,, 9j,, ... are computed, and in the second phase, the product (36) is computed. In the following, a scheduling scheme for the computation on the linear array such that the two phases of the algorithm can be implemented on an array of 2t cells is explained, where each cell requires only three multipliers. An almost completely pipelined operation is assumed, that is, we assume that the data for the next instance of the problem will arrive at the first cell of the array at time r + 2. This assumption may seem to be too stringent for RS decoding, since we have a gap of n - r between two instances of the problem. However, once a schedule for computation under

this assumption is found, it is very easy to re-schedule the computation in the time gap, in order to reduce the number of the multipliers required. For example, one can use time- multiplexing to use one physical multiplier instead of the three multipliers in the architecture. In this way, a better hardware usage can be achieved. Note that an equivalent complexity can be achieved by the Euclidean algorithm [5], however in the architecture explained here there is no need for syndrome computation.

The timing of the architecture is briefly explained. Denote the global time for the linear array by T . The computation begins at T = 1, when the first input ( 0 1 , GI, y1) arrives at the first cell. Therefore, the first cell begins its computation in the first phase of the algorithm at time T = 1, and ends its first phase of computation at T = 2t. If D1 # 0, the computation time of the second phase of the algorithm in the first phase begins at T = 3, otherwise it begins at T = 2. Generally, the computation time of the Zth cell is confined to the interval [ [(l - 1)/2J + 1, T + 1 - 11. Note that when the Zth cell receives its first input, the first cell has already received at least [ ( I - 1)/2J + Z inputs, and the Zth cell receives its last input at time r+Z - 1. In the second phase of the algorithm, the product of the polynomials (36) is found. Recall the definition of Qi in (21) and (22), and the corresponding bounds on the degree of the polynomials. For the sake of simplicity of notation, the coefficient of zk in 9k",,,(z) (9 ; , , (z ) ) is denoted by 9il[lc] (9il[k]). Therefore

Note that m 5 [i/21. Equation (22) can be re-written as

for i = 1 , 2 , . . . , 2 t and k = 0 , 1 , 2 , . . . , m . The above computation can readily be implemented on a linear systolic array. The following schedule is assumed: 91fl[lc] and 9;l[k] are computed at T = r + k + i, by the ith cell. The two phases of the algorithm can be implemented on one linear array, with negligible reduction in throughput. The Zth cell of the array begins its computation at a time no sooner than L(Z - 1)/2J + 1. The computation of the first phase of the algorithm finishes no later than r + Z - 1. In the Zth cell, the second phase of the algorithm begins at T + Z and finishes at r+Z+ rZ/21. Assume that the next set of data will arrive at array at T = r + 2. Therefore, the Zth cell will begin its computation on the new set of data at T = T + 1 + L(Z - 1)/2J + 2. Since T + 1 + L(Z - 1)/2J + 2 > r + 2 + [1/21, no conflict will occur between the first phase of computation on the new set of data, and the second phase of computation on the previous set of data. Since the two phases of the computation of the algorithm


~

885

are quite similar, the control structure of the architecture is quite simple. Finally, note that a similar architecture for the two-step algorithm can be derived.

VI. COMMENTS.

It has been shown how the solution technique for polynomial congruences derived in Section 111, in module-theoretic terms, led to novel and efficient decoding algorithms for RS codes, based on the WB equations. These decoding algorithms may be implemented using systolic arrays and hence have significant advantages for implementation. Other decoding algorithms, such as the Euclidean algorithm, have been implemented with systolic arrays [6], [7] and hypersystolic arrays [5] and have a similar complexity to the algorithms presented here. Nonetheless, the simplicity of the cells of the algorithm presented here, the fact that there is no need for syndrome computation, and that the remainder polynomial computation can be achieved with encoding hardware, presumably already existing on the chip, make the architecture described here particularly attractive. Preliminary studies of the design of a VLSI implementation of this algorithm support this view.

ACKNOWLEDGMENT

The authors wish to thank the reviewers for their useful suggestions.

REFERENCES

E. R. Berlekamp, “Bounded distance+l soft decision Reed-Solomon decoding,” preprint. E. R. Berlekamp and L. Welch, “Error correction for algebraic block codes,” U.S. patent 4 633 470, 1986. T. Yaghoobian and I. F. Blake, ‘Two new decoding algorithms for Reed-Solomon codes,” AAECC, Springer Leer. Nores Comp., vol. 5. New York Springer-Vexlag, 1994, pp. 23-43. P. Fitzpatrick, “A new look at the key equation,” in IEEE Int. Symp. Information Theory (San Antonio, TX, Jan. 1993), p. 98. E. Berlekamp, G. Seroussi, and P. Tong, “A hypersystolic ReedSolomon decoder,” in Reed-Solomon Codes and their Applicurions, S. B. Wicker and V. K. Bhargava, Eds. Piscataway, NJ:

T. K. Citron, “Algorithms and architectures for error correcting codes,” Ph.D. dissertation, Stanford Univ., Stanford, CA, 1985. H. M. Shao er al., “A VLSI design of a pipeline Reed-Solomon decoder,” IEEE Trans. Compur., vol. C-34, pp. 372-377, May 1985.

IEEE Press, 1994, ch. 10, pp. 205-241.

Fast parallel algorithms for decoding Reed-Solomon codes based on remainder polynomials

Documents