Low Power IP Design Methodology for Rapid Development of DSP Intensive SOC Platforms T. Arslan A.T. Erdogan S. Masupe C. Chun-Fu D. Thompson.

Low Power IP Design Methodology for Rapid Development of DSP Intensive SOC Platforms

T. ArslanA.T. ErdoganS. MasupeC. Chun-Fu

D. Thompson

Contents

• Introduction to power consumption• Introduction to Main Concepts• Low Power Design Methodology• IP implementations• Results and conclusions

Power Consumption in CMOS-Based DSP Systems

Vdd

VoutinV

C L

I sc Idy

Idd = I sc Idy+

I sc

V

Idy

V

t

out

in

P k.C.V . f I .V I Vave dd sc dd l dd 2

Power ReductionMethods

Reduce

C*= k.C

ReduceVdd

• Supply Voltage Reduction

• Clock Gating

Disadvantage:

• Added design effort

Common Approaches to Low Power Design

Systematic Low Power Design Approach

Exploit Algorithmic Correlations and Redundancies within an algorithm, then Map to hardware.

Verilog/VHDL

DSP AlgorithmLibrary

PerformanceCriteria

Block,Segmentation, etc.

Multiplier SC,Bus SC CAD

SynthesisComponent

Library

Ordering algorithm

Data representation

Netlist

Systematic Design Implementation Framework

Rapid Design and IP-Based Integration Platforms

. . .

P

IPy

IPx

. . .

Developed IPs

Parameterisation Options

Synthesis(Buildgate)

System Design(Verilog)

Verification(Behavioural Simulation)

Technology-SpecificNetlist

Verification(Gate-Level Simulation)

Verification(Post-Layout Simulation)

Floorplanning,Placement & Routing(Silicon Ensemble)

I/O PadsPlacement

Tape-out Verification(Dracula DRC/ERC/LVS)

SystemSpecifications

Layout

Design Flow for Filter IPs

FIR Filter Implementation

Typical Single Multiplier DSP Processor Architecture

Multiplier

Adder

Output register

Control

ADCinput

x(n)

DACoutput

y(n)

Data busCoefficient bus

Datamemory

Coefficientmemory

Multiplier-accumulator(MAC)

Transpose Direct Form (TDF) FIR Structure

x(n)

z-1z-1

h(0) h(1) h(2)

y(n)

h(N-1)

z-1. . .

. . .

. . .

stage0 stage1stage1. . . stageN-1

PCV1(n)PCV1(n) PCV2(n) PCVN-1(n)PCV0(n)

Multiplier

Adder

Control

ADCinput

x(n)

DACoutput

y(n)

Data bus I

Coefficient bus

PCVMCoefficient

memory

Data bus II

Modified DSP Processor Architecture for TDF FIR Filter Implementation

An Example SFG for IP2

Coefficient Memory Configuration with Coefficient Ordering

Order coefficients such that adjacent coefficients are highly correlated.

Filter Design(Matlab)

FilterSpecifications

Coefficient Set

Coefficient Ordering(C Routine)

OrderedCoefficient Set

Memory Configuration(C Routine)

Coefficient Words

Coefficient Word:

SF : Shift FlagSF = 1 shiftSF = 0 no shiftPCVMA : Pre-Calculated Value Memory Address

h(k) PCVMA SF

Coefficient Word Decomposition (Verilog Code)

An Example SFG for IP3

Memory Operations (Verilog Code)

Software Implementation Example for IP3

Power Evaluation

Filter Specifications

Lowpass filter specifications

Filter # Passband(kHz)

Stopband(kHz)

Passbandripple(dB)

Stopbandattenuation

(dB)

Windowfunction

Filterlength

1 0 - 1.5 2 - 4 0.1 50 Hamming 532 0 - 1.2 1.7 - 5 0.01 40 Kaiser 713 0 - 3.375 5.625 - 10 0.002 90 - 424 0 - 1 1.5 - 5 0.0135 56 - 615 0 - 1.5 2 - 4 0.1 50 Blackman 89

Bandpass filter specifications

Filter # Stopband(kHz)

Passband(kHz)

Stopband(kHz)

Passbandripple(dB)

Stopbandattenuation

(dB)

Windowfunction

Filterlength

1 0 - 0.1 0.15 - 0.25 0.3 - 0.5 0.1 60 Kaiser 732 0 - 0.45 0.9 - 1.1 1.55 - 7.5 0.8 30 - 343 0 -5 8 - 12 15 - 44.14 0.00868 60 Kaiser 544 0 - 1 2 - 3.5 4.25 - 5 0.13 56.4 - 325 0 - 0.1 1.375-3.625 4 - 5 0.1 68.4 - 80

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000sw

itche

d ca

paci

tanc

e (p

F)

IP1 IP2 IP3

PCVM buscoefficient busdata busmultiplier

25%

54%

Power Reductions Achieved (wordlength = 16 bit)

An example of a 6-tap FIR filter with block size of 3

Power Reductions for IP4 (wordlength = 16 bit)

0

100

200

300

400

500

600

700

800

900

switc

hed

capa

cita

nce

(pF)

1 2 4 8 16

Block Size

coefficient busdata busmultiplier

40% 42%

50%53%

Reductions in Number of Memory Accesses (%)

0

10

20

30

40

50

60

70

80

90

100

Red

uctio

n (%

)

2 4 8 16

Block Size

Data memoryCoefficient memory

Coefficient Set

Coefficient Set1 Coefficient Set2

Data Set

Shifter

Multiplier

Adder

Output

Coefficient Segmentation Algorithm

Example Segmentations

Example Segmentations

Coefficient Segmentation Algorithm for Two’s Complement Coding

Begin

H = (h0, h1, … , hL-1)

i = 0, k = 0

2i >= hk i = i + 1

sk = 2i-1

sk = - 2i

k > L -1i = 0

k = k + 1

End

No

No

No

Yes

Yes

Yes

3 hk <= 0

mk = 0

sk = hkNo

Yes

2 2i != hk

1

mk = hk-sk

mk = hk-sk

Coefficient Segmentation Algorithm for Sign-Magnitude Coding

Begin

H = (h0, h1, … , hL-1)

i = 0, k = 0

2i >= hk i = i + 1

mk = hk - 2 i-1

sk = 2 i-1hk

- 2i < hk - 2i-1

mk = hk - 2i

sk = 2i

hk < 0

mk = - mk

sk = - sk

k > L -1i = 0

k = k + 1

End

No

No

No

No

Yes

Yes

Yes

Yes

1

2

3

Total switching activity of H and M coefficient sets with Two’s Complement Coding

Total switching activity of H and M coefficient sets with Sign-Magnitude Coding

MSB(coefficient)

(two’s complement)

(sign magnitude)

Multiplier(two’s)

Add/Sub

Acc

Control

CoefficientMemory

DataMemory

Output

Simplified Filter Architecture for Mixed-Mode Multiplication

( sign magnitude)

( sign magnitude)

Multiplier(sign)

Add

Acc

Control

CoefficientMemory

DataMemory

Sign two’s

Output

Simplified Filter Architecture for Sign-Magnitude Multiplication

0

51015202530354045

#Tra

nsiti

ons/

sam

ple

b0 b2 b4 b6 b8 b10

b12

b14

Bit Position

conventionalsegmentation

Example Switching Activity Distribution with Two’s Complement Coding (N=89, W=16)

05101520253035404550

#Tra

nsiti

ons/

sam

ple

b0 b2 b4 b6 b8 b10

b12

b14

Bit Position


Example Switching Activity Distribution with Sign-Magnitude Coding (N=89, W=16)

Two’s complement Mixed mode Sign-magnitudeMultipliersize Algorithm swcap/sample

(pF)Reduction

(%)swcap/sample

(pF)Reduction

(%)swcap/sample

(pF)Reduction

(%)conventional 497 294 1628-bitsegmentation 236 52.52 222 24.49 81 50.00conventional 3862 2511 217316-bitsegmentation 2058 46.71 1806 28.08 1452 33.18conventional 14795 12281 1145824-bitsegmentation 11051 25.31 10283 16.27 9367 18.25

Power Reductions Achieved with Coefficient Segmentation

0

500

1000

1500

2000

2500

3000

3500

4000

switc

hed

capa

cita

nce

(pF)

twos mixed signData representation


Power Reduction in Multiplier Circuit (wordlength = 16 bit)

47% 35%53% 44%

62%

0

500

1000

1500

2000

2500

3000

3500

4000

twos mixed sign

Data representation

multipliershifter

switc

hed

capa

cita

nce

(pF) 46%

35%

51%44%

61%

Power Reduction (wordlength = 16 bit)

Power Reduction at Coefficient Bus (wordlength = 16 bit)

0

50

100

150

200

250

300

350

400

switc

hed

capa

cita

nce

(pF)

twos mixed sign

Data representation


49% 37%54%

37%

54%

DCT Implementation Scheme

2-D DCT Implementation Approach

Simplified Architecture of the DCT Processor

Conventional Programmable FIR Filter Architecture

TDF with Coefficient Ordering Programmable FIR Filter Architecture

Power Reduction (%)

IP1

tNC

ResetLoad

Clock

DataCoefficient

Output

Of/Uf

Top View of IP1

Block Report for IP1

IP2

tNC

ResetLoad

Clock

DataCoefficient

Output

Of/Uf

Top View of IP2


IP3

tNC

ResetLoad

Clock

DataCoefficient Word

Output

Of/Uf

Top View of IP3


0

2000

4000

6000

8000

10000

12000

14000

16000

Are

a

8-bit 16-bit 24-bit

Wordlength

IP1IP2IP3

Area Comparison

Top View of IP4

IP4

tNC

ResetLoad

Clock

DataCoefficient

Output

Of/UfBlock Size

IP5

tNC

ResetLoad

Clock

DataCoefficient Word

Output

Of/Uf

Top View of IP5

Top View of IP6

Case Study: a 34-tap bandpass filter

Area and Power Characteristics for the Example Filter

Conclusions

• A methodology for Low Power Implementation of DSP functions has been presented.

• The methodology has been used to develop a number of IPs.

• Significant reductions in Power is reported.

• Power reduction is achieved in the multiplier and system buses.

• Methodology can be used for prototyping other DSP functions.

Low Power IP Design Methodology for Rapid Development of DSP Intensive SOC Platforms T. Arslan A.T. Erdogan S. Masupe C. Chun-Fu D. Thompson.

Documents