YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
  • 8/12/2019 Realization Discrete Cosine Transform

    1/8

    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I : FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 39 , NO. 9, SEPTEMBER 1992 705

    On the Realization of Discrete CosineTransform Using the DistributedArithmeticYuk-Hee Chan, Student Member, IEEE, and Wa n-Chi Siu, Senior Member, IEEE

    Abstract-In this paper, we propose a unified approa ch forthe realization of forward and inverse discrete cosine trans-forms. By making use of this approach, one can realize an oddprime length DCT/ IDCT with two half-length convolutionswithout extra overheads in terms of the number of multiplica-tions. This formulation is most suitable for the realization usingthe distributed arithmetic. In such a c ase, typical convolvers canbe used a s the core unit for the hardware implementation of thetransforms. Hence, an efficient unified DCT/IDCT chip can bedesigned. A 2 D 1 X 11 unified DCT/IDC T chip is also pro-posed to demonstrate the superiority of the proposed formula-tion in this paper. The proposed architecture can easily meet thespeed requirement of 143-MHz real-time operation with thecurrent 2 - ~ m MOS technology.

    I. INTRODUCTIONE DISCRETE cosine transform (DCT) [ l ] is widelyTsed in digital image processing, especially in image

    transform coding, as it performs much like the optimalKarhunen-Loeve transform (KLT) [2] under a variety ofcriteria. Many algorithms [3]-[14] for the computation ofthe DCT have been proposed since the introduction ofthe DCT by Ahmed, Natarajan, and Rao [ l ] in 1974.However, though most of them are good software solu-tions to the realization of DCT, only a few of them arereally suitable for VLSI implementation.

    Cyclic convolution plays an important role in digitalsignal processing due to its nature of easy implementa-tion. Specifically, there exists a number of well-developedconvolution algorithms 1151 and it can be easily realizedthrough modular and structural hardware such as dis-tributed arithmetic [16] and systolic array [17].

    The way of data movement forms a significant part inthe determination of the efficiency of the realization of atransform using the distributed arithmetic. The realizationof a cyclic convolution with the distributed arithmeticrequires only simple table look-up technique and somesimple rotations of the corresponding data set. Hence, thecyclic convolution structure can be considered as thesimplest form that is most suitable to be realized with thedistributed arithmetic. It is because of this reason, onemay consider that the basic criterion for the realization of

    Manuscript received July 30, 1991; revised July 15, 1992. This paperThe authors are with the Department of Electronic Engineering,IEEE ogNumber 9204228.

    was recommended by Associate Editor M. A. Soderstrand.Hong Kong Polytechnic, Hung Horn, Kowloon, Hong Kong.

    a transform using the distributed arithmetic relies on thepossibility of having an efficient way to convert the trans-form into the cyclic convolution form. If we could be ableto convert a transform into the cyclic convolution formwith the minimum number of operations, it would implyan optimal approach for the realization of the transformusing the distributed arithmetic.

    Some basic formulations [8]-[11] have been suggestedfor the realization of the DCT using the distributed arith-metic. In their formulations, they either still requiredsome extra multiplications for their formulations [9], [101,or have to use cyclic convolutions of different lengths [SI,[ l l ] . The former case has the major problem that itviolates the major advantage of the distributed arithmeticwhich replaces multiplications by additions. The lattercase requires relatively complicated circuitry to allow therealization of cyclic convolutions of variable lengths. Dif-ferent from the above approaches, one may also convertthe DCT into the Discrete Fourier Transform (DFT) [31,[131 and make use of the famous algorithms [181, 1191 toconvert the corresponding DFT into cyclic convolutionform. Indeed, this is a possible approach; however, it turnsa real transform into a transform with complex numbers.The realization could still be complicated even if somesimplification techniques are to be applied.

    In this paper, we propose an algorithm to convert anodd prime length DCT/IDCT into two half-length cyclicconvolutions directly. This algorithm involves no multipli-cation during the conversion and suggests a possible solu-tion to design a unified DCT/IDCT chip. Due to thenature of the structure, this algorithm is most suitable forthe VLSI implementation using the distributed arithmetic.A 2-D 11 X 11 unilied DCT/IDCT chip design is alsoprovided in this paper to demonstrate the superiority ofthe proposed algorithm.

    11.ONE-DIMENSIONALCTThe DCT [ l ] of data { y ( i ) : i= 0, 1 .. N - 1) is given bythe following:

    N- ?r2 Nk )= c y ( i ) c o s - (2 i + l ) k ) ,i = O

    k = 0 , 1 . * . N - 1. (1)1057~7122/92 03.00 1992 IEEE

    Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on September 30, 2009 at 02:16 from IEEE Xplore. Restrictions apply.

  • 8/12/2019 Realization Discrete Cosine Transform

    2/8

    706 IEEETRANSACTlONS ON CIRCUITSAND SYSTEMS-I: F U N D A ME N T A L T H E O R Y AND APPLICATIONS, VOL. 39 , NO. 9, SEPTEMBER 1992

    If N is an odd number, there exists a bijective mappingon the set {i:i = 0, 1 . N 1):

    ( N 2i)2N 12 for i = 0,1... N - 1. (2)i ) =

    For example, if N = 11,we have {[ i) = 5, 4 3, 2, 1, 0,10, 9, 8, 7, 6) where i = 0, 1, 2 10 accordingly. Bymaking use of this bijective mapping, we can split 1) andrewrite it as

    for k = 1,2-. .(N 1)/2 (3)where

    N - 1A(k) = C f(i)cos

    iN-

    i = lfor k = 1,2 ( N 1)/2 4)

    5 )f i) =Y S ( i > )h ( i ) = ( -~ )e (~)y(i)) for i = 0,1... N 1.If N is an odd prime P, there exist two bijective

    mappings defined asq i ) = gi)pl ( k ) = g - k ) p

    for i = 1,2.-. P 1for k = 1,2-.-P 1 6)i

    where g is a primitive root of P.sequences { A ( k ) } nd {B k))or k = 1, 2To make use of these two mappings, one can redefineP 1 as

    P- 1A(k) = f i) COS (' p ) (7a)

    7b)i = lP- 1i = l

    Then both A(k) and B ( k ) defined in (7) can be con-verted into a (P 1)-length cyclic convolutions by map-ping i and k to q i )and l( k) , respectively. In formula-tion, we have

    f o r k = 1 ,2 .- .P - 1 (sa)

    fork = 1,2..- P 1. (8b)

    However, to make the algorithm more efficient, we canmake a further simplification on (8a) and (8b). In particu-lar, as

    for i = 1,2 -. .(P 1)/2 (9)and

    fo rk = 1,2. . . (P- 1)/2 (10)then (8a) and (8b) can be rewritten as (lla) and (llb),respectively:

    fo rk = 1 ,2 . . . (P - 1 )/2 ( l l a )

    for k= l, 2. . . (P 1)/2. (llb )Equations (lla) and (llb) are exactly a P 1)/2

    length cyclic convolutions and a. ( P 1)/2-length skew-cyclic convolution respectively. Hence, A ( k ) and B k) ork = 1, 2 - . - ( P 1)/2 defined as 4) can be realizedthrough two P 1)/2-length convolutions (one cyclicconvolution and one skew-cyclic convolution) with an ad-ditional cost of P 1 additions.

    Let us use an example with P = 11 (primitive rootg = 2) to clarify our approach.

    First of all, we realize sequences {A(k):k = 1, 2..*5)and {B(k):k = 1,2 .-- ) via a 5-length cyclic convolutionand a 5-length skew-cyclic convolution, respectively. In

    Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on September 30, 2009 at 02:16 from IEEE Xplore. Restrictions apply.

  • 8/12/2019 Realization Discrete Cosine Transform

    3/8

    , . ..

    c ND SIU: ON THE REALIZATION OF DCT 707

    andY 0)= IY(3) +Y(7)) + {Y(l) +Y(9))

    +IY(2) +YW) + {Y(O) +Y(10))+{Y(4) + Y W +Y(5).

    As the sequence {f( cp(i)) + cp P 1)/2 + ill: i = 1,2 P 1)/2} is computed during the realization ofA ( [ ( k ) ) , the computation of Y(0) requires P 1)/2additions only. In other words, a P-length DCT can berealized with two P 1)/2 length convolutions with acost of 2(P 1) additions totally.

    111.ONE-DIMENSIONALDCTThe IDCT of data ( Y ( k ) : k= 0, 1 N 1) is given by

    the following:N - 1

    2Nk =Oi = O , l . * * N - l . (12)

    If N is an odd number, (12) can be rewritten as

    i = O , l - - . N - l . (13)By making use of the bijective mapping defined in (2),equation (10) can be further rewritten as

    for i = 1,2 ( N 1)/2 (14)where

    N-1)/2 2.rrikG(i) = k = { ( - l ) Y ( 2 k ) } c o s ( ~ )fori = 1, 2. .. (N - 1)/2 (15a)

    for i = 1 , 2 ( N 1)/2. (15b)Obviously, by making use of the zero-padding tech-nique, we can redefine sequences {G(i)} and ( H ( i ) ) as

    Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on September 30, 2009 at 02:16 from IEEE Xplore. Restrictions apply.

  • 8/12/2019 Realization Discrete Cosine Transform

    4/8

    I1 I l l

    708 IEEE TRANSACITONSON CIRCUITSAND SYSTEMS-I: F U N D A ME N T A L T H E O R Y AND APPLICATIONS, VOL. 39, NO. 9 EPTEMBER 1992

    follows:N - 1 2mikG(i) = {Yo(k)}cos(7)k = 1

    fori = 1 , 2 - - - 1 16)where

    fo rk = 1 ,2 . . . (N- 1 )/2(17)

    elseYo(k) =

    and,N - 1k = 1

    fo r i = 1,2... N 1 (18)where

    Then G(i) and H ( i ) are exactly in the form of (7a) and(7b), respectively. In the previous section, we have provedthat equations in the form of (7a) and 7b) an be con-verted into cyclic convolution form easily by using themappings defined in (6) if N is an odd prime P. By usinga similar approach, we can rewrite (16) and (18) as thefollowing:

    +YO([((P 1)/2 k))}COSfo r i = 1,2...(P 1)/2 (20)

    for i = 1,2 ( P 1)/2. (21)Equations (20) and (21) are ( N 1)/2-length cyclicconvolution and skew-cyclic convolution, respectively. In

    such case, an odd prime length IDCT can also be realizedvia two half-length convolutions similar to the case for theDCT.Note that no multiplication is involved as overheads forthe conversion of an odd prime P-length IDCT into

    convolutions.As either Y( l ( k ) )or Y ( ( P 1)/2 k))is zero for k = 1,2 P 1)/2, no addition is requiredto compute the sequence {Y(l(k)) Y([((P 1)/2k)): k = 1, 2 ( P 1)/2). A similar case occurs duringthe computation of the sequence {Yo(C(k) ) Yo([ P1)/2 k)): k = 1, 2 P 1)/2}. Actually, only 2(P1) additions are required during the conversion. In otherwords, a P-length IDCT can be realized through twoP 1)/2-length convolutions with a cost of 2(P 1)additions. This is exactly the same cost that a P-lengthDCT is required to be realized with convolutions.

    Again, we use the example with N = 11 to clarify ourapproach.

    To compute the sequence {G i):i= 1,2 e - 51, we canmake use of (201, (17), and (6),

    where c n ) = c o s ( 2 n ~ / l l ) .On the other hand, we can obtain sequence {H(i):i =1,2 5 ) by making use of (20, (191, and (6):

    where s ( n ) = sin ( 2 n ~ / l l ) .

    Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on September 30, 2009 at 02:16 from IEEE Xplore. Restrictions apply.

  • 8/12/2019 Realization Discrete Cosine Transform

    5/8

    709HAN AND S I U ON THE REALIZATION OF DCT

    Finally, we use 14) to compute the final result, {y(i):i= 0,

    y(5) = Y 0 ) Y 2 ) Y 4 ) Y 6 ) Y 8) Y 10) .Both the DCT and the IDCT can be realized via

    convolutions with the same cost. Specifically, if both ofthem possess the same length, one can make use of thesame convolution module to realize both the forward andthe inverse DCT. As cyclic convolution is the core moduleof this algorithm, this algorithm is most suitable for therealization using the distributed arithmetic and it alsosuggests an efficient and effective way to design a unifiedDCT/IDCT chip.

    IV. VLSI IMPLEMENTATIONF UNIFIEDDCT / IDCT CHIP

    In the preceding sections, we have proposed an algo-rithm to convert a P-length DCT/IDCT into a half-lengthcyclic convolution and a skew cyclic convolution. Thisprovides a straightforward but ideal solution for the VLSIimplementation of a unified DCT/IDCT chip by makinguse of the distributed arithmetic.g q k ) C ( q ) .Since g q k ) can be expressed as g qk ) = -g q k ) , C1 g q k),2-, where M , g qk), and g q k ) , are the word length, the jth mostsignificant bit, and the sign bit, respectively. After scalingto 2s-complement fractional number, F ( k ) can be rewrit-ten as = CIM,;{C:I; g q k) ,~(q))2-~ Eg q k ) , C ( q ) . Values of E g q k ) j C ( q ) can beprecalculated and stored in a ROM with ROM size = 2 Nwords. Then F ( k ) can be obtained by A4 ROM accessesand M 1 shift-additions after g ( n ) s are available. Notethat the same table can be used for the computation ofF ( k ) for any value of k , which is impossible in the case ofcomputing inner products other than a cyclic convolution.Hence, to a certain extent, one can consider that thedistributed arithmetic is most suitable for VLSI imple-mentation of cyclic convolutions.

    Several high-performance chips have been designed bymaking use of the distributed arithmetic [20]-[261. How-ever, in most designs, the distributed arithmetic is used torealize a typical inner product directly without first con-verting the transform into cyclic convolutions. In such a

    Consider a cyclic convolution defined as F ( k ) =

    case, optimal performance of the distributed arithmeticcan not be achieved and the consequence of which is therequirement of a large memory size for the constructionof the data tables.

    A P X P unified DCT/IDCT can be implemented bythe row-column decomposition technique as shown in Fig.1. In fact, the row-column approach is commonly appliedin most 2-D DCT chips due to its flexible and regularnature. We first compute the PP X 1DCT/IDCTs alongeach row and store the results in an intermediate array.We then compute the P P X 1DCT/IDCTs along eachcolumn to yield the final results. Note that the intermedi-ate memory is realized by a R Mof P x P words andthe transposition operation can be easily achieved by asuitable control of the addresses of the intermediatearray.

    Fig. 2shows the block diagram on the one-dimensionalunified DCI/IDCT module. The module mainly consistsof three operating units, namely, an accumulator, apre/post-processing unit, and a kernel-processing unit.Note that the whole process is a three-state pipeline. Theaccumulator is responsible for the computation of the dcterm in the DCT mode and the y((N 11/21 term in theIDCT mode, which involves additions or subtractions only.A typical accumulator can satisfy this requirement. Thepre/post-processing unit is actually a typical adder whichis responsible for the preparation of the input data forconvolutions in the DCT mode and the computation ofthe final results from the convolution outputs in the IDCTmode. The arrangement of the pre/post-processing stageand the kernel-processing stage determines the configu-ration of the unified chip, which can be easily handledwith multiplexers. The table provided in Fig. 2 specifiesthe relationship between the MUXs configuration andthe mode configuration of the module.

    Both preshuffling and postshuffling of data can beeasily done through the table lookup technique. In atypical pipeline design, input data and output data arenormally buffered. Hence, if the sequence of the ad-dresses can be generated in such a way that the input orthe output data are fetched in a desirable order, thenboth the preshuffle and the postshuffle can be achieved.As the transform size is typically fixed and small, thedesirable address sequence can be precomputed andstored in a small table. In such case, appropriate data canbe fetched with indirect addressing method.

    The kernel-processing unit basically consists of twoconvolvers. Both convolvers are realized with the dis-tributed arithmetic. Fig. 3shows the implementation of a5-point convolver, which can be used in the VLSI realiza-tion of an 11-point unified DCT/IDCT chip. The twoconvolvers differ from each other in both of their addressgenerators and their lookup tables stored in ROMs. Inthis example, the internal word length, the word length ofdata { x ( i ) }and { X ( k ) } re, respectively, 12, 8, and 12 bits.Note that these parameters can always achieve a signal-to-noise ratio of greater than 44 dB under the simulationtest.

    Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on September 30, 2009 at 02:16 from IEEE Xplore. Restrictions apply.

  • 8/12/2019 Realization Discrete Cosine Transform

    6/8

    710 IEEE TRANSACTIONS ON CIRCUITS AN D SYSTEMS-I: FUNDAME NTAL THEORY AND APPLICATIONS, VOL. 39, NO. 9, SEPTEMBER 1992

    obtain the final result. The circular buffer advances 6 bitsand repeats the foregoing procedures until all results areobtained. This completes a full convolution cycle andstarts another one by loading another input sequence oneclock cycle later. In such a case, the circular buffer rotatesOpera t ion P-po in tcolumn6 bits every clock cycle. Hence, the address generator canFig. 1. Block diagramof the rOW-CdUmn approach for 2-DD m / I D f f . be implemented with six independent 10-bit bit-

    Accumulator1

    I ]

    C(inligiir.iliiiii 11 Z lU\ ,In

  • 8/12/2019 Realization Discrete Cosine Transform

    7/8

    CHANAND SIU ON THEREALIZATION OF DCT

    and

    ~

    The two kernel matrices are then respectively identical tothe two kernel matrices used for the realization of [A 5),4 3 1 , A 4 ) , =4 2), 4lll and [-B 5), B 31, -B 4), -BO),-B(1)IT in the DCT realization. Hence, whether the chipis configured to perform a DCT or an IDCT, no modifica-tion of the convolvers is necessary. Consequently, nearlyno silicon area of the chip is idle in a particular transform.A highly efficient unified chip can be implemented.

    Furthermore, as shown in Figs. 1 and 2, the convolversare the core units of the unified chip and the whole chipinvolves no multiplier. Since the convolutions are refor-mulated at the bit level by using the distributed arith-metic, the following advantages can be achieved: 1) noactual multiplication involved as multipliers are replacedby memory look-up tables, 2) high accuracy as it suffersfewer rounding/truncation error than the other struc-tures, 3) possible for modular circuit design as the struc-ture is extremely regular, and 4) simple structure whichleads to a saving of gate count and makes routing easy.These features allow a high-speed circuit design com-posed of memories, adders, and registers only.

    The proposed design aims to achieve a throughput rateof 1 output per clock cycle. Obviously, the two convolu-tion modules play a significant role in the unified chip anddominate the timing performance of the whole chip. Bymaking use of the current 2-pm CMOS technology, theproposed architecture can easily meet the speed require-ment of 14.3-MHzreal-time operation.

    V. CONCLUSIONSIn this paper, we propose a new algorithm to realize an

    odd prime P-length DCT with two half-length convolu-tions (one cyclic convolution and one skew-cyclic convolu-tion). This algorithm can be easily modified to realize anIDCT with odd prime length. In such a case, one canrealize both DCT and IDCT with the same convolutionother than the convglutions required for realizing eitherDCT or IDCT are just 2(P 1) additions and some

    I module if they possess the same length. As the operations

    711

    simple permutations, only a small percentage of the uni-fied chip is idle in a particular transform. Hence, one candesign a very efficient unified chip. Furthermore, by mak-ing use of the distributed arithmetic, the VLSI implemen-tation of the convolution module can result in a verysimple and modular structure without multiplier. In otherwords, an efficient unified DCT/IDCT chip which in-volves only adders, latches, and memory tables can beimplemented in a very straightforward way. These algo-rithms can also be easily extended to realize a multidi-mensional DCT/IDCT by using the row-column decom-position technique. A 2-D 11 X 11 unified DCT/IDCTchip design is also proposed in this paper. The proposedarchitecture can easily meet the speed requirement of14.3-MHz real-time operation with the current 2-pmCMOS technology.

    REFERENCESN. Ahmed, T. Natarajan, and K R. Rao, Discrete cosine trans-form, ZEEE Trans. Computers,vol. C-23, pp. 90-94, 1974.P. A. Wintz, Transform picture coding, Proc. IEEE, vol. 60, pp.809-820, July 1972.M. J. Narasimha and A. M. Peterson, On he computation of thediscrete cosine transform, IEEE Trans. Commun., vol. COM-26,pp. 934-936, June 1978.Z. Wang, On computing the discrete Fourier and cosine trans-forms, IEEE Trans. Acoust., Speech, Signal Processing,vol. ASSP-33, pp. 1341-1344, Oct. 1985.M. Vetterli and H. Nussbaumer, A simple F+T and DCT algo-rithms with reduced number of operation, Signal Processing, vol.6, pp. 267-278, Aug. 1984.H. S. Hou, A fast recursive algorithm for computing the discretecosine transform, IEEE Trans. Acou st., Speech, Signal Processing,B. G. Lee, A new algorithm to compute the discrete cosinetransform, IEEE Trans. Acoust., Speech, Signal Processing, vol.ASSP-32,pp. 1243-1245, Dec. 1984.P. Duhamel and H. Hmida, New 2 DCT algorithms suitable forVLSI implementation, in Proc. ICASSP-85, pp. 780-783, Mar.1985.Y. H. Chan and W. C. Siu, Algorithm for prime length discretecosine transform, Elec&on, Lett., vol. 26, pp. 206-208, Feb. 1990.A new convolution structure for the realization of discretecosine transform, in Proc. ISCASW, pp. 2373-2376, May 1990.W. Li, A new algorithm to compute the DCT and its inverse,ZEEE Trans. Signal Processing ,vol. 39, pp. 1305-1313, June 1991.N. I. Cho and S U. Lee, Fast algorithm and implementation of2-D discrete cosine transform, ZEEE Trans. Cir cuits Syst., vol. 38,pp. 297-305, Mar. 1991.S. C. Chan, Efficient index mapping for computing discrete cosinetransform, Electron. Lett., vol. 25, pp. 1499-1500, Oct. 1989.B. G. Lee, Input and output index mappings for a prime-factor-decomposed computation of discrete cosine transform, ZEEETrans. Acoust., Speech, Signal Processing,vol. ASSP-37,pp. 237-244,Feb. 1989.H. J. Nussbaumer, Fast Fourier Transform and Conv olutwn Algo-rithms. New York Springer-Verlag, 1982.S. A. White, Applications of distributed arithmetic to digitalsignal processing: A tutorial review, ZEEE ASSP Mag., vol. 6, pp.0. Ersoy, Semisystolic array implementation of circular, skewcircular, and linear convolutions, IEEE Trans. Computers, pp.

    A two-stage representation of DFT and its applications,IEEE Trans. Acou st., Speec h, Signal Processing, vol. ASSP-35 pp.825-831, June 1987.C. M Rader, Discrete Fourier transforms when the number ofdata samples is prime, Proc. ZEEE, vol. 56, pp. 1107-1108, June1968.M. T. Sun L. Wu, and M. L. Liou, A concurrent architecture forVLSI implementation of discrete cosine transform, IEEE Trans.Circuits Syst., vol. CAS-34,pp. 992-994, Aug. 1987.M. Maruyama, H. Uwabu, I. Iwasaki, H. Pujiwara, T. Sakaguchi,

    vol. ASSF-35, pp. 1455-1461, Oct. 1987.

    4-19, July 1989.190-196,1985.

    Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on September 30, 2009 at 02:16 from IEEE Xplore. Restrictions apply.

  • 8/12/2019 Realization Discrete Cosine Transform

    8/8

    712 IEEE TRANSACTIONS ONCIRCUITSAND SYSTEMS-I: FUNDAMENTAL T H E O R Y AND APPLICATIONS, VOL. 39, NO. 9, SEPTEMBER 1992

    M. T. Sun, and M. L. Liou, VU1 architecture and implementa-tion of a multi-function forward/ Inverse discrete cosine transformprocessor, in Proc. SPIE, pt. 1, pp. 410-417, Oct. 1990.[22] N.Demassieux, G. Concordel, J. P. Durandeau, and F. Jutand,Optimized VLSI architecture for a multiformat discrete cosinetransform, in P m CASSP87, pp. 547-550, Apr. 1987.[231 A. M. Gottlieb, M. T. Sun, and T. C. Chen, Video rate 16multiplied by 16 discrete cosine transform IC, in Proc. IEEE 1988Custom Integrated Circuirs Con , pp. 8.2/1-4, May 1988.[24] A. Artieri, S. Kritter, F. Jutand, and N.Demassieux, A one chipVLSI for real time two-dimensional discrete cosine transform, inProc. ISCAS88, pp. 701-704, June 1988.[25] J. C. Carlach, P. Penard, and J. L. Sicre, TCAD: A 27 MHz 8 X 8discrete cosine transform chip, in h c . ICASSP89, pp.2429-2432, May 1989.

    [26] T. C. Chen, A. Gottlieb, and M. T. Sun, V U1 implementation ofa 16 X 16 DCT, in Proc. ICASSP88, pp. 1973-1976, Apr. 1988.

    Ynk-Hee Chan (S89) received the B.Sc. (Hons)degree in electronics from the Chinese Univer-sity of Hong Kong in 1987. He is now workingtowards the Ph.D. degree in the Department ofElectronic Engineering, Hong Kong Polytech-nic, Kowloon, Hong Kong.His research interests include fast computa-tional algorithms, signal processing, image com-pression, and VLSI echniques.

    Wan-Chi Siu (S777-M77-SM90) received theassociateship in electronic engineering fromHong Kong Polytechnic, the M.Phi1. degree inelectronics from the Chinese University of HongKong, and the Ph.D. degree in digital signalprocessing from the Imperial College of Science,Technology and Medicine, London.Between 1975 and 1980 he was with the Chi-nese University of Hong Kong, where he was anelectronic engineer before he left the Depart-ment of Electronics. He joined Hong Kong Poly-technic in 1980, initially as a lecturer, then as senior lecturer, and thenas a principal lecturer. He is presently a Reader and the Leader of theComputer Engineering Section of the Department of Electronic Engi-neering, and is also the Chairman of the Departmental Research Com-mittee. He has published more than 80 research papers. His researchinterests include digital signal processing, transforms, fast computationalalgorithms, high-performance computer architecture, parallel processing,fast techniques on image processing and pattern recognition.Dr. Siu was the Chairman of the Technical Program Committee of the1987 IEEE Asian Electronic Conference and was also the Chairman ofthe Technical Program Committee of the 1989 International Symposiumon Computer Architecture Digital Signal Processing organized by theIEE Hong Kong Center. He was a co-chairman of the Technical Pro-gram Committee of the IEEE Region 10 Conference on Computer andCommunication Systems that was held in Hong Kong in September,1990, and is now the Chairman of the IEEE Hong Kong Chapter ofSignal Processing. He is also chartered engineer.


Related Documents