Top Banner
Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela. SPAIN Dept. Electronic and Computer Engineering e-mail: (roberto,bruguera)@dec.usc.es
27

USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

Entropy Coding on a Programmable Processor Array for Multimedia SoC

Roberto R. Osorio and Javier D. Bruguera

University of Santiago de Compostela. SPAIN

Dept. Electronic and Computer Engineering

e-mail: (roberto,bruguera)@dec.usc.es

Page 2: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 2Roberto R. Osorio – ASAP 2007

Outline

Entropy coding Relevance Complexity

Options for implementation Application-specific accelerators Reconfigurable instruction-set extensions Programmable processors

ASIPs Our proposal as a processors array Implementation view

Implementation details Results and conclusions

Page 3: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 3Roberto R. Osorio – ASAP 2007

Entropy coding

Lossless data compression More probable symbols (events) → short codewords Less probable symbols → long codewords

It is a critical task in implementing multimedia standards It is more than just Huffman or arithmetic coding

• Zig-zag, run-length, binarization, context selection,... Focusing just on pure entropy coding renders poor acceleration On JPEG-2000 represents more than 50% of computations On other standards is just 5-10%, however...

• 10% can be a lot in video encoding

• It does not benefit from SIMD or MIMD due to: Data dependencies Bit-level operations

Page 4: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 4Roberto R. Osorio – ASAP 2007

Options for implementation

Application-specific hardware Highest performance

• High throughput, low latency and low power consumption

• Optimized integration reduces latency and cost Painful design process

• Skilled engineers needed

• Complex implementation. Errors may show up after taping out

• No flexibility: one design → one or two applications

Reconfigurable instruction-sets or accelerators High flexibility: one application → one design

• Errors can be corrected at (almost) any time Still, many times slower, bigger and power hungry than an ASIC Painful design process

• Skilled engineers

• Benefits of accelerating small kernels limited by Amdahl's law

Page 5: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 5Roberto R. Osorio – ASAP 2007

Options for implementation (2)

Programmable processors Limited performance, high power consumption Several choices

• Scalar processors → poor performance You get what you paid for

• Super scalar → high power consumption Diminishing returns

• VLIW → something in between Preferred choice for implementing multimedia systems Performance suffers due to data dependencies

Best flexibility

• One design → any application

• Changes can be applied on the field

Page 6: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 6Roberto R. Osorio – ASAP 2007

Entropy coding on programmable processors

Example application Context-adaptive Binary Arithmetic Coder (CABAC) in H.264

• Data binarization

• Context selection and updating

• Binary arithmetic coding

• Bit-stream formation The number of operations in high-quality encoding scenarios is

overwhelming!

50 200M. symbols / s

2.5 10Gops !!

00111010001 10000011101~50 ops /symbol

RISC or VLIW

Page 7: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 7Roberto R. Osorio – ASAP 2007

Hardware-software co-design

Need for efficient implementations Processing speed Power consumption

MPEG-4. Encoder VGA resolution @30fps 4.1 GIPS

HW

0 RISC 21 RISC

SWLow cost Greater flexibilityExploration

SW: 5 RISC, 4 threads

Coproc: Clip Div Abs Sgn(88% utilization)

HW (80% performance)DCT, SAD,

BDIFF, BADD, BQ, BIQ

HW (65% performance)DCT, SAD

SW: 15 RISC, 16 threads(75% utilization)

Coproc: Clip Div Abs Sgn PierrePaulinST MicroelectronicsEuromicro DSD 2004

Page 8: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 8Roberto R. Osorio – ASAP 2007

Motivation for a new platform

Devices Formats

JPEG

GIF

PNG

TIFF

JPEG 2000

MPEG-1

MPEG-2

MPEG-4 SP

H.264

WMV

QuickTime

PDF …

Algorithms

Huffman

Q-Coder

QM-Coder

MQ-Coder

CABAC

Rice

Golomb

Exp-Golomb

Lempel-Ziv

Run-length …

Applications

Image visualization

Video playing

Music

Sound recording

Still digital cameras

Video cameras

Digital TV

Time shifting

Multiple tuners

Continuous recording

Page 9: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 9Roberto R. Osorio – ASAP 2007

Motivation for a new platform

Increasing complexity

1990’s

Thousands of lines

5 13 50

146

350

1022

500500

1637

2002 3G 2010

Engineers x month

Source: TI 2002

1000

1500

Support multiple

standards;

services;

applications

+Complexity grows

quadratically with the size of the problem

+Implementation for

heterogeneous platforms

Page 10: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 10Roberto R. Osorio – ASAP 2007

ASIP

Application-Specific Instruction-set Processor Tailored to a given range of applications

Best performance and lower cost for a programmable processor Still retains high flexibility

Design process From scratch From a base processor

• Profiling

• Adding new instructions / removing unused ones

• Adding / removing functional units

• Tailoring instruction format and signal widths Other alternatives

• Tensilica

Page 11: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 11Roberto R. Osorio – ASAP 2007

Our ASIP implementation

Array of low cost processors 8-bit processors 2-stage pipeline: fetch/decode and execute 2 instructions per cycle in a VLIW fashion Each processor has its own data and code memories

Communication through queues A linear structure has been found to be sufficient so far

Global memory accessed through a shared bus

mem

P

mem

P

mem

P

mem

P

Local memory

Processor

Local memory

Processor

Local memory

Processor

Local memory

Processor

Page 12: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 12Roberto R. Osorio – ASAP 2007

Architecture

Programlocal memory

Pipeline registers

Datalocal memory

Fetch & decodingFlow control

Registers bank

8 8

88

Page 13: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 13Roberto R. Osorio – ASAP 2007

Instruction set

8-bit instructions add and sub with and without carry and, or, exor left and rigth shift and rotation (only 1 bit each time) conditional (zero, carry) and unconditional branch memory load and store data and code prefetch queue input and output

16- bit instructions: carry bit passes to the next ALU We do not implement

call and return

• put an address in the queue for next processor

• jump to an address in the queue stack management interrupts

Page 14: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 14Roberto R. Osorio – ASAP 2007

Programming model

Start up First processor reads starting address from the queue Initialization subroutine puts an address for the next processor

• After a few cycles, all processors are up

Processing Each processor executes a part of the code and communicates with other

processors using the queues

• Processors read the queues at specific points in their code Empty/full queues make processors stall

• The same applies for data or code not present in the local memory

Switching to another subroutine When the work is done, processors read a new address from the queue

• Some processors always execute the same piece of code

Page 15: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 15Roberto R. Osorio – ASAP 2007

Distributing the code

LOOP

Databinarization:

Contextmodelling:

Encodingiteration:

Output:

LOOPCall

Return

Call

Call

Return

Return

for(…){ for(…){ for(…){ for(…){ ….. ….. ….. ….. } } }}

Idealstructure

Page 16: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 16Roberto R. Osorio – ASAP 2007

Case study

CABAC encoding in H.264 Follows a pipelined structure Irregular algorithms

• Not well suited for software pipelining Zig-zag coefficient ordering: LUT-based indirections Binarization: data dependencies Context managing: Table accessing and updating Binary arithmetic coding: Bit-level operations and data dependencies

JPEG encoding Zig-zag coefficient ordering: LUT-based indirections Token formation: data dependencies Huffman encoding: bit manipulation

Page 17: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 17Roberto R. Osorio – ASAP 2007

Results

Comparing with a TI TMS320C6711 VLIW DSP 5 of our processors were used in both cases CABAC

10 macroblocks from the 3rd frame of Foreman QCIF encoded as a P-frame with quantizer 28

JPEG 10 macroblocks from Lena image with quality level 75

VLIW DSP Processors array Speed-up

CABAC 500620 48974 10.2

JPEG 112150 39512 2.8

Page 18: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 18Roberto R. Osorio – ASAP 2007

Other algorithms

We expect other encoding algorithms to perform similar to the proposed ones: CAVLC in H.264 Huffman in MPEG-2 and 4 EBCOT in JPEG-2000,...

Decoding presents serious data dependencies We have studied CABAC decoding We have being working on reducing the impact of data dependencies At this moment we do not have:

• A whole implementation

• An efficient implementation on other platform to compare with

Page 19: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 19Roberto R. Osorio – ASAP 2007

Other algorithms

Zig-Zagquantization

Run-legthCoefficients processing

Huffmanencoding

Bit-streamformation

Ebcot1.1

Ebcot1.2

Contextmodeling

Encodingiteration

Bit-streamformation

Contextsmodeling

Encodingiteration

Bit-streamformation

Zig-Zagquantization

Significance mapSignificant coefficients

Ebcot 1.1Context modeling

Ebcot1.2

Bit-streamparsing

Arithmeticdecoding

Bit-streamparsing

Contexts modelingCoefficients reconstruction

Arithmeticdecoding

Zig-Zagde-quantization

Bit-streamparsing

Coefficientsreconstruction

Huffmandecoding

Zig-Zagde-quantization

CABAC encodingH.264

JPEGencoder

JPEG 2000encoder

CABAC decodingH.264

JPEGdecoder

JPEG 2000decoder

Page 20: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 20Roberto R. Osorio – ASAP 2007

Data dependencies in the decoder

Data

reconstruction:

Context

modelling:

Decoding

iteration:

Data binarization:

Context modeling:

Dfskdfjkadsfsa sa

kf s faskfj saf

ds skfj

Encoding iteration:

Output:

Context modeling:

Dfskdfjkadsfsa sfully prog

Ramm

able processor

able to implem

ent an

y encoding or

ecoding algorithm w

ith high efficiency

Able to switch to anot

her a

lgorithm in a

short time

With a performance in be

tween a programmable pro

essor an

d a hardware acceleratora

Data binarization:

Context modeling:

Dfskdfjkadsfsa sa

kf s faskfj saf

ds skfj

Encoding iteration:

Dfskdfjkadsfsa sfully prog

Ramm

able processor

able to implem

ent an

y encoding or

ecoding algorithm w

ith high efficiency

Able to switch to anot

her a

lgorithm in a

short time

With a performance in be

tween a programmable pro

essor an

Data binarization:

Context modeling:

Dfskdfjkadsfsa sa

kf s faskfj saf

Output:

Context modeling:

Dfskdfjkadsfsa sfully prog

Ramm

able processor

able to implem

ent an

y encoding or

ecoding algorithm w

ith high efficiency

Able to switch to anot

LOOP LOOP

Page 21: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 21Roberto R. Osorio – ASAP 2007

Work around

data_reconstruction(…){…do{

context_modeling(…)…

}…

}

context_modeling(…){……decoding_iteration(…)……

}

decoding_iteration(…){…………

}

data_reconstruction(…){

do{

context_modeling(…)

use_value

}

}

context_modeling(…){

decoding_iteration(…)

use_value

}

decoding_iteration(…){

}

INLINING

data_reconstruction(…){

do{

// context_modeling

decoding_iteration(…)

use_value

}

}

decoding_iteration(…){

}

CODE REDISTRIBUTION

data_reconstruction(…){

do{

decoding_iteration(…)

use_value

}

}

decoding_iteration(…){

}

~

Page 22: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 22Roberto R. Osorio – ASAP 2007

Applicationsbzr 100input $2output $4xor $0 $0add $0 1output $2fetch $4and $4 7add $1 4sl0 $4add $4 $5

begin

-- registers clocking SYNC: process (clk, reset) begin

if(clk'event and clk = '1') then if(reset = '1') then codigoOutReg <= "0000"; numSeqOutReg <= "000"; calcSreg <= "0000000000000000"; calcCreg <= "0000000000000000"; shiftOutReg <= "000";

+ASIC

FPGAcoarse grain

media

processor

Page 23: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 23Roberto R. Osorio – ASAP 2007

Implementation issues

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

PYield

Utilization

Power

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

mem

P

• Reduce voltage• Reduce clock

frequency

Page 24: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 24Roberto R. Osorio – ASAP 2007

An ASIP-based media-processor

I/O

I/O

Mem

Mem

P

ASIP ASIP ASIP ASIP

ASIP ASIP ASIP ASIP

I/O

I/O

Mem

Mem

P

ASIP DCT ME ME

Codif. Filter ME ME

I/O

I/O

Mem

Mem

P

ASIP iDCT ASIP ASIP

Decod. MC Filter ASIP

Page 25: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 25Roberto R. Osorio – ASAP 2007

Implementation results

Area and speed figures for the proposed processor using AMS 0.35µ libraries

Area (nand gates) 5673

Clock speed (MHz) 180

Registers cost (bits) 232

Latency (cycles) 2

Maximum throughput (instr/cycle) 2

Code memory (KB) 1

Data memory (KB) 1

Page 26: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 26Roberto R. Osorio – ASAP 2007

Comparison

Processors array

TI C6711

Registers (bits) 1672 > 2048

ALU (bits) 80 256

Local memory (KB) 10 8+64

Speed (MHz) 180 150

Technology AMS 0.35µ TI 0.13µ

Approximate comparison of the hardware cost of a 5-element processors array and a TI C6711 VLIW DSP

Page 27: USC 2007 Entropy Coding on a Programmable Processor Array for Multimedia SoC Roberto R. Osorio and Javier D. Bruguera University of Santiago de Compostela.

USC 2007 27Roberto R. Osorio – ASAP 2007

Conclusions

Entropy coding is a complex task in multimedia applications that often needs of hardware acceleration

The implementation cost and lack of flexibility demand programmable solutions with comparable performance

ASIPs are a intermediate solution between hardware accelerators and general purpose processors

In this work an ASIP is proposed for entropy encoding This ASIP is not based on optimized new instructions but

on achieving high parallelism in computations and data flow

Results demonstrate that this is a valid approach for the applications we have studied

We pretend to extend the results to other applications