06/23/2022
Dec 31, 2015
04/19/2023
04/19/2023
HARDWARE OPTIMIZED DCT-IDCT IMPLEMENTATION ON VERILOG HDL
RAHUL SRIKUMARECE734:VLSI ARRAY STRUCTURES FOR
DSP 05/10/13
04/19/2023
Contents• Algorithm• Implementations• Performance• Results• Conclusion• Future Work
04/19/2023
Algorithm• 8 point DCT • 2D DCT = C*X*Transpose(C)• C – coefficient matrix
25- 71 106- 126 126- 106 71- 25
49 118- 118 49- 49- 118 118- 49
71- 126 25- 106- 106 25 126- 71
91 91- 91- 91 91 91- 91- 91
106- 25 126 71 71- 126- 25- 106
118 49 49- 118- 118- 49- 49 118
126- 106- 71- 25- 25 71 106 126
91 91 91 91 91 91 91 91
C
04/19/2023
Algorithm(Cont’d)• 1D DCT = C*X• 2D DCT = Transpose(1D DCT)* C• 1D IDCT = Transpose(C) * 2D DCT • 2D IDCT = Transpose(1D IDCT) * Transpose(C)
04/19/2023
Implementations Part 1• Input word length – 8 bits• 1D DCT internal word length – 11 bits• 2D DCT output word length – 9 bits• 2D IDCT output word length – 8 bits• 4 implementations were evaluated
1. Serial In (SI) – 1 pixel at a time
2. 2 Parallel In (2PI) – 2 pixels at a time
3. 4 Parallel In (4PI) – 4 pixels at a time
4. 8 Parallel In (8PI) – 8 pixels at a time
04/19/2023
Implementations Part 2
• 8 registers of 8 bits each for coefficient storage.• very efficient when compared to 64 registers required for 8*8 DCT/IDCT computation.• 2 RAMS each of 64 locations(8 bit wide) are used.• RAMS are enabled in the order
en_ram1_write->(en_ram1_read, en_ram2_write)->en_ram2_read
04/19/2023
Performance 1
1. Serial In (1 pixel at a time)• Read 8 inputs = 8 cycles• Register 8 inputs + sign extension = 1 cycle• Add/Sub = 1 cycle• Absolute value = 1 cycle• Multiplication = 1 cycle• Final addition = 2 cycles• Total = 14 cycles
04/19/2023
Performance 2
1. 2 Parallel In (2 pixel at a time)• Register 8 inputs + sign extension = 4 cycle• Add/Sub = 1 cycle• Absolute value = 1 cycle• Multiplication = 1 cycle• Final addition = 2 cycles• Total = 9 cycles
04/19/2023
Performance 3
1. 4 Parallel In (4 pixel at a time)• Register 8 inputs + sign extension = 2 cycle• Add/Sub = 1 cycle• Absolute value = 1 cycle• Multiplication = 1 cycle• Final addition = 2 cycles• Total = 7 cycles
04/19/2023
Performance 4
1. 8 Parallel In (8 pixel at a time)• Register 8 inputs + sign extension = 1 cycle• Add/Sub = 1 cycle• Absolute value = 1 cycle• Multiplication = 1 cycle• Final addition = 2 cycles• Total = 6 cycles
04/19/2023
Synthesis
• Target Platform : ALTERA Cyclone IV GX FPGA• Tool Used : Quartus II• Language Used : Verilog
04/19/2023
Results 1
combinational blocks5600
5700
5800
5900
6000
6100
6200
6300
6400
Combinational Blocks
8 Parallel
4 Parallel In
2 Parallel In
Serial In
• Serial In has lowest synthesized combinational area because of lowest number of wires needed to feed in the data.
04/19/2023
Results 2
• Serial In has lowest synthesized area due to least number of storage elements and counters required to process the data.
Registers4520
4540
4560
4580
4600
4620
4640
4660
4680
4700
4720
Registers
8 Parallel
4 Parallel In
2 Parallel In
Serial In
04/19/2023
Results 3
• 8 parallel In takes 236 cycles in contrast to 246 for serial in.
Cycles to 2D IDCT of 8*8 block230
232
234
236
238
240
242
244
246
Total Computation Time
8 Parallel
4 Parallel In
2 Parallel In
Serial In
04/19/2023
Conclusion
• Serial In occupies ~6% less area than 8 parallel In with a performance degradation that is comparatively lower(~4%).
04/19/2023
References
• A Fast Hybrid Dct Architecture Supporting H.264, Vc-1, Mpeg-2, Avs And Jpeg Codecs by Muhammad Martuza, Carl McCrosky and Khan Wahid at 11TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, SIGNAL PROCESSING AND ITS APPLICATIONS.
• An Area Efficient Dct Architecture For Mpeg-2 Video Encoder by Kyeounsoo Kim and Jong-Seog Koh in IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 45, NO. 1, FEBRUARY 1999.
• Architecture Design of Shape-Adaptive Discrete Cosine Transform and Its Inverse for MPEG-4 Video Coding by Hui-Cheng Hsu et. Al in IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 3, MARCH 2008.
• Integer DCT Based on Direct-Lifting of DCT-IDCT for Lossless-to-Lossy Image Coding by Taizo Suzuki, Student Member, IEEE, and Masaaki Ikehara, Senior Member, IEEE in IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010.