Top Banner
M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping Zhan
17

M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

:: Milestone 2 2-D Discrete Cosine Transform

Group M2:Tommy Taylor Brandon HsiungChangshi XiaoBongkwan Kim

Project Manager: Yaping Zhan

Page 2: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

Project statusDesign Proposal (Complete)Architecture Proposal (Almost Complete): Algorithm description (Done): High level simulation (Done): Mapping algorithm into hardware (Done): Behavioral Verilog and test bench (Debugging)Size estimates/floor plan (To be completed): Structural Verilog: More accurate transistor count: Floor plan

Page 3: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

Design decisions

Do not include motion prediction

Go with 2-D DCT

Use SRAM

No pipelining

Will not run in real-time

Page 4: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

Distributed algorithm of 1D DCT :

A = cos(/4)

B = cos(/8)

C = sin(/8)D = cos(/16)E = cos(3/16)

F = sin(3/16)G = sin(/16)

A A A A

B C -C -B

A -A -A A

C -B B -C

x0 + x7

x1 + x6

x2 + x5

x3 + x4

X0

X2

X4

X6

= 1/2

D E F G

E -G -D -F

F -D G E

G -F E -D

x0 - x7

x1 - x6

x2 - x5

x3 - x4

X1

X3

X5

X7

= 1/2

Page 5: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

In two’s complement representation:

ui = -buiB-1 + j=1, B-1 2-jbui

j

Where, buij is the jth bit, bui

B-1 is the MSB, i.e. the sign bit

Xn = j=1,B-1 2-jDn(bj) – Dn(bB-1), where Dn(bj) = (i=1,3Ci,n buij)

A A A A

B C -C -B

A -A -A A

C -B B -C

b015 b0

14…b00

b115 b1

14…b10

b215 b2

14…b20

b315 b3

14…b30

X0

X2

X4

X6

=

For example, D0(b14) = Ab014+Ab1

14+Ab214+Ab3

14

Distributed algorithm of 1D DCT (continued):

Page 6: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

1D DCT architecture

out_data(16)

Selector

+ -

+ +R R

Parallel to serial

Control logic

ROM

in_data(16)

in_valid

out_valid

out_ready

out_done

clk

vdd

vss

reset

Register file 8x16

Register file 8x16

Bit addressgenerator

Bit addressgenerator

ROM

Page 7: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

2D DCT :

Two 1D DCT can operate in pipeline to boost throughput performance, this requires RAM can be read and wrote at the same time and each 1D DCT module read/write the RAM in row and column order alternatively.

1D DCT (on rows)

1D DCT (on columns)

Transpose RAM

Data in

Data out

Control logic

Page 8: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

Transistor count and performance estimation :

adder register ROM Control logic total pins

4x16x30 18x16x20 8x16x2 1000 ~9k 40

1DDCT module :

2DDCT = 2x1DDCT + SRAM ~ 24k

throughput latency

8 samples/64 cycle 528 cycle

Page 9: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

High level simulation (in C/C++) :three implementation of 1DDCT:

1. Based on definition

2. Based on fast algorithm

3. Based on distributed algorithm

input

Function 1

Function 2

Function 3

Matlab

comparepass/fail

Page 10: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

-

Selector

R0 R7 We begin by inputting eight, sixteen bit values into individual registers

We use a selector to select the registers that will be added and subtracted

The R0 & R7 values are added and subtracted in parallel...So forth for R1 & R6...R2 & R5....R3 & R4

It will take 8 clock cycles to get all the data

R7R0

Step 1:

Page 11: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

Step 1 (Verilog)

always @ (posedge clk or negedge rst) begin if(rst==0) begin

count <= 0; end else begin

if(in_clr==1) begin count <= 0; end else begin if(in_valid && ~out_full) begin buf[count] <= in_data; count <= count + 1; end end

end end // always @ (posedge clk or negedge rst)

always @ (posedge clk) begin if(in_read) begin

out_data1 <= buf[in_addr]; out_data2 <= buf[7-in_addr];

end end

Write operation

Read operation

Page 12: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

Bit Address Generator

Store the results from the addition and subtraction into 8, 16' registers

Taking the first bit in each of the four registers (addition results and subtraction result) we use the value to allow the bit address generator to store it in the proper position in ROM

R0 R7bit 1bit 1bit 1bit 1

1011

Rom0 Rom7

Step 2

Page 13: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

Step 2 (Verilog)always @ (posedge clk or negedge rst) begin if(rst==0) begin

count <= 0; end else begin

if(in_clr==1) begin count <= 0; end else begin if(in_read & ~out_full) begin buf[count] <= in_data; count <= count + 1; end end

end end

always @ (in_bitpos) begin out_addr[3] <= buf[0][in_bitpos:in_bitpos]; out_addr[2] <= buf[1][in_bitpos:in_bitpos]; out_addr[1] <= buf[2][in_bitpos:in_bitpos]; out_addr[0] <= buf[3][in_bitpos:in_bitpos]; end

Bit address generator

Read operation

Page 14: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

Rom0 Rom7

R5 R6

S1S0

Parallel to Serial

From the ROM the data in the addresses are added, stored in a register then the result is shifted (multiplied by a factor of two...two's complement)

Step 3

Page 15: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

Step 3 (Verilog)always @ (posedge clk or negedge rst) begin if(rst==0) begin

out_data <= 0; bit_pos <= 15;

end else begin

if(in_clr==1) begin out_data <= 0; bit_pos <= 15; end else begin if(~out_done) begin out_data <= out_data + in_data; bit_pos <= bit_pos - 1; end end // else: !if(in_clr==1)

end end

Page 16: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

C Code Result

Page 17: M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.

M2: Team Paradigm

::conclusion & questions

: Implementing 2D DCT

: Roughly 24k transistor count

: Verilog needs debugging