DCT HSRA Implementation Joseph Yeh December 3, 1998.

DCT HSRA Implementation

Joseph Yeh

December 3, 1998

Outline

• Introduction / Overview• Rationale / Prior Art• Implementation

Strategy• Quality / Capacity• SCORE • Power

Overview

• Backbone of Image Compression Standards

• JPEG, MPEG• still image compression of 8-10:1

• Similarity Transform on Image Data

• B = CACT

Focus of Implementation

• 1D DCT• Needs to be done 16 times

on every 8x8 image block

Big Picture (JPEG)

Rationale

• Personal Background• Versatile - Two Mediabench Apps• Completely Feedforward • Platform to Compare Wavelets

Prior Art

• Common ASIC• Jeffrey Jacob (Toronto) Master

Thesis:– “Memory Interfacing for OneChip

Reconfigurable Processor”– 2D DCT on Altera Flex10K50

• Xilinx 2D DCT implementation

Implementation

• Many fast algorithms exist• Two provided in Mediabench

JPEG – Loeffler, Lightenberg, Moschytz

• jfdctint.c

– Arai, Agui, Nakajima• jfdctfst.c

• Both attempted

General Data Flow

Source: Jeffrey Jacob Master Thesis

General Data Flow

19 adds 5 mults 10 adds

Basic Strategy

• Whole design based on Ripple Adders– multipliers decomposed into Ripple

Adders– simple to manipulate– placement not restricted by Cascades– only need to skew at input and output

Spatial Implementation

Vital Statistics

• Level 11 Array- 2048 BLB’s• 8*9 = 72 Inputs • 2*(12)+2*(13)+4*(14) = 106 Outputs• 547 BLB’s of Logic

– only 26% usage

• Latency: 156 Cycles• 8-deep retiming will add 284 BLB’s

Speedup

• Feed-forward implies a set of outputs every cycle

• “gcc -O3” compilation of jfdctfst.c• on Sun Ultrasparc 10• gives 0.2 seconds for 1048576 1D DCT’s• 190 ns per DCT => Speedup ratio of 47

Speedup = (250 Mhz)*(MP cycles per DCT)/(Rate of MP)

Quality/Capacity (1)

• Choice of AAN itself a Quality decision

• Estimate of 13 BLB’s saved per bit of precision in mults

• Increased PSNR probably a fluke, but reduced capacity designs definitely worth considering

Precision BLB's PSNRBase 547 32.40Base-2 521 32.55Base-3 503 32.57Base-4 490 19.40

Base-2 Precision

Base-4 Precision

• Think in terms of whole JPEG application

• Reconsider block diagram in greater detail

SCORE: Big Detailed Picture

SCORE: Swapping Scheme

Power Statistics

• HSRA - 668 cycle run– Activity:

• LUT Outputs: 0.288• System Inputs: 0.232• Total: 0.285

Power Statistics

• RippleAdders– Most active LUTs in lower significant

bits of adders- especially in input array

– Higher order outputs flip sign frequently, causing bit toggling throughout two’s complement representation

Power Statistics

• HSRA (600 cycle simulation):– Energy:

• LUT Outputs: 1095.57 pJ• Inputs: 137.29 pJ• Clock Energy: 6412800 pJ

» (300 pJ)(2^(levels-6))(cycles)

• Total: 6414033 pJ

Retrospect

• BOOM? – Current code size ~ 1500 lines of Java– Jeffrey Jacobs RTL (2D) ~2600 lines

• Primary concern not with architecture but with backend tools– postscript output of ar?

Future Directions

• Reduced precision• Cascade-LUTs• More “spatially” suited

implementations of DCT?– Find one or make one up!

• IDCT

DCT HSRA Implementation Joseph Yeh December 3, 1998.

pj slide

blbs slide

precision slide

output slide

attempted slide

representation slide

power statistics slide

spatial implementation

Documents

Final Filtracion !!!! Yeh

Community Policing in TaipeiSandy Yeh 1 Community Policing.....

megazin yeh dunya

DCT Bedrijfsprofiel

Can van Ark Reform HSRA? - calrailnews.org the Cambridge...

Seek Yeh First

Gee yeh-enterprise

Soren Yeh Portfolio

Yeh kashmir hai !!!

facultate specializare fi an stud nume prenume DCT sem. 1...

Router Dct

Compresión de Vídeo. Tema 2.8. Otras...

Yeh kaun chitrakaar hai Boond Jo Ban Gayi Moti (1967) ·...

An Examination of Quality Improvement Methodology and...

Final Report David Yeh

He loves You He loves you, yeh, yeh, yeh, He loves you, yeh,...