Top Banner
Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. http://vada.skku.ac.kr
104

Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

Dec 13, 2015

Download

Documents

Marian Cross
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

Low Power System Level Design Methodologies

Jun-Dong ChoSungKyunKwan Univ.

Dept. of ECE, Vada Lab. http://vada.skku.ac.kr

Page 2: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

2

Contents

Introduction to System Level Design Hardware and Software Co-design Re-configurable Processors Other Low Power System Level Designs

Page 3: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

3

Introduction to SOC

• SOC will bridge the gap b/w s/w and their implementation

in novel, energy-efficient silicon architecture.

•Chips are assembled at IP block level and IP interfaces rather than gate level Design Reuse

•SOC specs are coming from ICT system engineers rather

than RTL descriptions.

Page 4: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

4

Four main applications Set-top box: Mobile multimedia system, base

station for the home local-area network. Digital PCTV: concurrent use of TV,3D graphics,

and Internet services Set-top box LAN service: Wireless home-networks,

multi-user wireless LAN Navigation system: steer and control traffic and/or goods-transportation

Page 5: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

5Types of System-on-a-Chip Designs

Page 6: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

6

Silicon in 2010Die Area: 2.5x2.5 cmVoltage: 0.6 VTechnology: 0.07 m

Density Access Time(Gbits/cm2) (ns)

DRAM 8.5 10DRAM (Logic) 2.5 10SRAM (Cache) 0.3 1.5

Density Max. Ave. Power Clock Rate(Mgates/cm2) (W/cm2) (GHz)

Custom 25 54 3Std. Cell 10 27 1.5

Gate Array 5 18 1Single-Mask GA 2.5 12.5 0.7

FPGA 0.4 4.5 0.25

Page 7: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

7

Why Lower Power Portable systems

long battery life light weight small form factor

IC priority list power dissipation cost performance

Technology direction reduced voltage/power

designs based on mature high performance IC technology, high integration to minimize size, cost, power, and speed

Page 8: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

8

year

Power(W)

1980 1985 1990 1995 2000

10

20

30

40

50

5

15

25

35

45

i286i386 DX 16 i486 DX25

i486 DX 50

i486 DX2 66 P-PC601 50

P6 166

P5 66

Alpha21064 200

Alpha 21164

i486 DX4 100

P II 300

P-PC604 133

P-PC750 400

P III 500

Alpha 21264

Microprocessor Power Dissipation

Page 9: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

9

Three Factors affecting Energy– Hardware Simplification: redundant h/w extraction– All in on Approach(SOC): I/O pin reduction– Voltage Reducible Hardwares

2-D pipelining (systolic arrays) SIMD(Single Instruction stream, Multiple Data

stream) Parallel Processing:useful for data w/ parallel structure

VLIW(Very Long Instruction Word) Approach- flexible

MIMD(Multiple Instruction streams, Multiple Data streams)

Page 10: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

10

New Computing Platforms

SOC power efficiency more than 10GOPs/w Higher On Chip System Integration: COTS: 100W,

SOAC:10W (inter-chip capacitive loads, I/O buffers) Speed & Performance: shorter interconnection,fewer

drivers,faster devices,more efficient processing artchitectures

Mixed signal systems Reuse of IP blocks Multiprocessor, configurable computing Domain-specific, combined memory-logic

2P kCFV

Page 11: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

11

Physical gap Timing closure problem: layout-driven logic and RT-level

synthesis Energy efficiency requires locality of computation and

storage: match for stream-based data processing of speech,images, and multimedia-system packets.

Next generation SOC designers must bridge the architectural gap b/w system specification and energy-efficient IP-based architectures, while CAE vendors and IP providers will bridge the physical gap.

Page 12: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

12

Levels for Low Power DesignSystem

Algorithm

Architecture

Circuit/Logic

Technology

Hardware-software partitioning,

Complexity, Concurrency, Locality,

Parallelism, Pipelining, Signal correlations

Sizing, Logic Style, Logic Design

Threshold Reduction, Scaling, Advanced packaging

Possible Power Savings at Different Design LevelsLevel of

Abstraction Expected Saving

Algorithm

Architecture

Logic Level

Layout Level

Device Level

10 - 100 times

10 - 90%

20 - 40%

10 - 30%

10 - 30%

Regularity, Data representation

Instruction set selection, Data rep.

SOI

Power down

Page 13: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

13

Low Power Design Flow IFunction

Partitioning andHW/SW Allocation

SystemLevel

Specification

System-LevelPower Analysis

BehavioralDescription

SoftwareFunctions

ProcessorSelection

Power-drivenBehavioralTransformation

Behavioral-LevelPower Analysis

Power ConsciousBehavioralDescription

Power AnalysisRT-LevelHigh-Level

Synthesis andOptimization

SoftwareOptimization

Software-Level

Power Analysis

To RT-Level Design

Page 14: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

14

Low Power Design Flow II

RT-levelDescription

RTLmapping

Logic SynthesisandOptimization

Gate-LevelPower Analysis

Gate-level

Description

Power AnalysisSwitch-LevelHigh-Level

Synthesis andOptimization

RTLLibrary

Data-path Controller

Switch-level

Description

Standard cellLibraryProcessor

Control andSteering Logic

Memory

RTLMacrocells

Page 15: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

15

Reducing Waste Locality of reference Demand-driven / Data-driven computation Application-specific processing Preservation of data correlations Distributed processing

Page 16: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

16

Eliminating Redundant Computations

Page 17: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

17

Power-hungry Applications Signal Compression: HDTV Standard, ADPCM,

Vector Quantization, H.263, 2-D motion estimation, MPEG-2 storage management

Digital Communications: Shaping Filters, Equalizers, Viterbi decoders, Reed-Solomon decoders

Page 18: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

18

IBM’s PowerPC Lower Power Architecture Optimum Supply Voltage through Hardware Parallel, Pipelining ,Parallel instruction

execution 603e executes five instruction in parallel (IU, FPU, BPU, LSU, SRU) FPU is pipelined so a multiply-add instruction can be issued every clock cycle Low power 3.3-volt design

Use small complex instruction with smaller instruction length IBM’s PowerPC 603e is RISC

Superscalar: CPI < 1 603e issues as many as three instructions per cycle

Low Power Management 603e provides four software controllable power-saving modes.

IBM’s Blue Logic ASIC :New design reduces of power by a factor of 10 times

Page 19: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

19

Power-Down Techniques

◆ Lowering the voltage along with the clock actually alters the energy-per-operation of the microprocessor, reducing the energy required to perform a fixed amount of work

Page 20: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

20

Voltage vs Delay

•Use Variable Voltage Scaling or Scheduling for Real-time Processing •Use architecture optimization to compensate for slower operation, e.g., Parallel Processing and Pipelining for concurrent increasing and critical path reducing.

Page 21: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

21

Low Voltage Main Memories

Page 22: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

22

Why Copper Processor? Motivation: Aluminum resists the flow of

electricity as wires are made thinner and narrower.

Performance: 40% speed-up Cost: 30% less expensive Power: Less power from batteries Chip Size: 60% smaller than Aluminum chip

Page 23: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

23

Silicon-on-Insulator How Does SOI Reduce Capacitance ?

Eliminated junction capacitance by using SOI (similar to glass) is placed between the impuritis and the silicon substrate high performance, low power, low soft error

Page 24: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

24

Design Challenges Current systems are complex and heterogenous

Contain many different types of components Half of the chip can be filled with 200 low-power,

RISC-like processors (ASIP) interconnected by field-programmable buses, embedded in 20Mbytes of distributed DRAM and flash memory

Computational power will not result from multi-GHz clocking but from parallelism, with below 200 MHz. This will greatly simplify the design for correct timing, testability, and signal integrity.

Page 25: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

25

Application- Specific Instruction Processor

Processor architecture tailored not just for application domain (e. g., DSP, microcontrollers), but for specific sets of applications (e. g., audio, engine control)

ASIP characteristics Greater design cost (processor + compiler) + Higher performance, lower power than

commercial cores, more flexibility than ASIC

Page 26: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

26

ASIP Design Given a set of applications, determine micro

architecture of ASIP (i. e., configuration of functional units in datapaths, instruction set)

To accurately evaluate performance of processor on a given application need to compile the application program onto the processor datapath and simulate object code

However, the micro architecture of the processor is a design parameter!

Page 27: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

27

ASIP Design Flow

Page 28: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

28

Compiler Optimizations Machine independent optimizations

Parallelizing transformations, Common subexpression elimination, Constant Propagation, Strength reduction, Loop Invariant Code motion

Machine dependent optimizations Loop unrolling and software pipelining Static allocation (non- recursive procedure calls) Storage layout (arrays, scalars) Optimization of mode setting instructions Instruction selection, scheduling, and register

allocation

Page 29: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

29

Loop unrolling The technique of loop unrolling replicates the body of a loop some number of

times (unrolling factor u) and then iterates by step u instead of step 1. This transformation reduces the loop overhead, increases the instruction parallelism and improves register, data cache or TLB locality.

Loop overhead is cut in half because two iterations are performed in each iteration. If array elements are assigned to registers, register locality is improved because A(i) and A(i +1) are used twice in the loop body. Instruction parallelism is increased because the second assignment can be performed while the results of the first are being stored and the loop variables are being updated.

for i to N

A i A i A i A i

= -

( ) = ( ) + ( - ) ( + )

2 1

1 1

for i to N

A i A i A i A i

A i A i A i A i

= - 2 step 2

( ) = ( ) + ( - ) ( + )

( ) = ( ) + ( ) ( + )

2

1 1

1 1 2

Page 30: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

30

Loop Unrolling (IIR filter example)

loop unrolling : localize the data to reduce the activity of the inputs of the functional units or two output samples are computed in parallel based on two input samples.

Neither the capacitance switched nor the voltage is altered. However, loop unrolling enables several other transformations (distributivity, constant propagation, and pipelining). After distributivity and constant propagation,

The transformation yields critical path of 3, thus voltage can be dropped.

)( 211

211

nnnnnn

nnn

YAXAXYAXY

YAXY

22

1

211

nnnn

nnn

YAYAXY

YAXY

Page 31: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

31

Loop Unrolling for Low Power

Page 32: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

32

Loop Unrolling for Low Power

Page 33: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

33

Loop Unrolling for Low Power

Page 34: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

34

Implementing Digital Systems

Page 35: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

35

Configurability One-M gate reconfigurable, one-M gate hardwired

logic. Reduce design risks for which NRE costs will

become dominant 50GIPS for programmable components or 500 GIPS for dedicated hardwares 1 V with the watt range

Page 36: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

36

Bridging the architectural gap Product reliability: design at a level far above the

RT level, with reuse factors in excess of 100 Trade-off: 100MOPs/watt (microprocessor)

100GOPs/watt (hardwired) Reconf. Computing with a large number of computing nodes and a very restricted instruction set (Pleiades)

Page 37: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

37

Cross-Disciplinary nature Software for low power:loop transformation leads

to much higher temporal and spatial locality of data.

Code size becomes an important objective Software will eventually become a part of the chip

Behavior-platform-compiler codesign: codesigned with C++ or JAVA, describing their h/w and s/w implementation.

Multidisciplinary system thinking is required for future designs (e.g., Eindhoven Embedded Systems Institute http://www.eesi.tue.nl/english)

Page 38: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

38

Low Power DSP 수행시간의 대부분이 DO-LOOP 에서 이루어짐

VSELP Vocoder : 83.4 %2D 8x8 DCT : 98.3 %LPC computation : 98.0 %

DO-LOOP 의 Power Minimization ==> DSP 의 Power Minimization

VSELP : Vector Sum Excited Linear PredictionLPC : Linear Prediction Coding

Page 39: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

39

VLSI Signal Processing Design Methodology

pipelining, parallel processing, retiming, folding, unfolding, look-ahead, relaxed look-ahead, and approximate filtering

bit-serial, bit-parallel and digit-serial architectures, carry save architecture

redundant and residue systems Viterbi decoder, motion compensation, 2D-

filtering, and data transmission systems

Page 40: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

40

Common Fabric for IP Blocks Soft IP blocks are portable, but not as predictable as

hard IP. Hard IP blocks are very predictable since a specific

physical implementation can be characterized, but are hard to port since are often tied to a specific process.

Common fabric is required for both portability and predictability.

Wide availability: Cell Based Array, metal programmable architecture that provides the performance of a standard cell and is optimized for synthesis.

Page 41: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

41

H/W and S/W Co-design

Page 42: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

42

Mixing H/W and S/W Argument: Mixed hardware/ software systems

represent the best of both worlds.High performance, flexibility, design reuse, etc.

Counterpoint: From a design standpoint, it is the worst of both worlds

Simulation: Problems of verification, and test become harder

Interface: Too many tools, too many interactions, too much heterogeneity

Hardware/ software partitioning is “AI- complete”!

Page 43: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

43

Partitioning Performance Requirements

몇몇의 Function 들은 Hardware 로의 구현이 더 용이 반복적으로 사용되는 Block Parallel 하게 구성되어 있는 Block

Modifiability Software 로 구성된 Block 은 변형이 용이

Implementation Cost Hardware 로 구성된 Block 은 공유해서 사용이 가능

Scheduling 각각 HW 와 SW 로 분리된 Block 들을 정해진 constraints 들에 맞출 수 있도록

scheduling SW Operation 은 순차적으로 scheduling 되어야 한다 Data 와 Control 의 의존성만 없다면 SW 와 HW 는 Concurrent 하게

scheduling

Page 44: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

44

Low power partitioning approach

Different HW resources are invoked according to the instruction executed at a specific point in time

During the execution of the add op., ALU and register are used, but Multiplier is in idle state.

Non-active resources will still consume energy since the according circuit continue to switch

Calculate wasting energy Adding application specific core and partial running Whenever one core performing, all the other cores are shut

down

Page 45: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

45

Effective Resource Utilization+

+

+

+

D

D

S

5 1 2

3 4

6

7

Retiming

D

D

D

D

D+

+

+

+S

51 2 6

7

43

Before AFTER

CYCLE Multipliers1 1, 3

2, 4

-

-5

6, 8

7

2

13

4

Adder8

6

7

5

Adder Multipliers

2

1

1

1

-

Can reducd interconnect capacitance.

Page 46: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

46

Partitioning Process

- Derives a graph G- operation and connection

- Decomposition of G into a set of clusters- cluster : set of operation

- Calculate bus-traffic energy- Pre-select clusters with constraints- Set the number of resources- List scheduling- Test the utilization rate (ASIC or µP)

- the utilization rate of µP is supported by SW estimation tool

Page 47: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

47

Design FlowApplication

DevideAppliction in

cluster

List schedule

Computeutilizationrate(ASIC)

Select cluster

Computeutilizationrate(uP)

-

Core EnergyEstimation

HW Synthesis

Evaluate

- Max 94% energy saving and in most case even reduced execution time- 16k sell overhead

Page 48: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

48

Interface Interface Block 의 필요성

Hardware 와 Software Block 간의 Data 전달 효율적인 Interface Block 을 구성해야만 HW/SW

Block 간의 Overhead 를 줄일 수 있다

Interface 방법 Shared Memory FIFO Handshaking protocol

Page 49: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

49

Logical Bus ArchitectureSystem Bus Signals

address, data, control signalsaddress space consists of the memory space & I/O spacememory space : memory of the SW componentI/O space : ports within SW & registers in other HW

Port SignalsThese are specialized signals capable of directly interfacing between SW & HW component

Interrupt SignalsWhen SW & HW components have completed an operation, or when an error condition is detected

Page 50: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

50

Co-Simulation Co-simulation 의 필요성

HW part 와 SW part 를 함께 Simulation 을 할 수 있게 해 줌으로써 구성된 System 의 결과를 예측할 수 있다

System Performance 를 예측하여 Synthesis 이전에 지정된 Spec. 에 맞도록 System 을 재설계할 수 있도록 해 준다

HW/SW Partitioning 을 위한 각 Sub-block 의 특성을 예측해 준다

Co-simulation Tool Ptolemy COSSAP POLIS

Page 51: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

51

Hardware/Softrware C0-Design Flow

Analysis of Constra ints& Requirem ents

System Specification

Hardware & SoftwarePartitioning

HardwareDescription

SoftwareDescription

Interface SynthesisHardware Synthesis

& ConfigurationSoftware G eneration &

Param eterization

ConfigurationM odules

HardwareCom ponents

HW / SWInterface

SoftwareM odules

HW / SW Integration &Cosim ulation

IntegrationSystem

System Evaluation Design Verification

Page 52: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

52

Partitioning Example: CDMA Searcher

P N -C odeG enera to r

µ¿ ±â´© À û´Ü(R ea l)

µ¿ ±â´© À û´Ü(Im age)

¿ ¡³Ê Á ö°è»ê´Ü(R ea l)

¿ ¡³Ê Á ö°è»ê´Ü(Im age)

ºñ± ³, ¼ ±Å à ´Ü ºñµ ¿ ±â´© À û´Ü ºñ± ³, ¼ ±Å à ´Ü

P N -C odeG enera tion

S ynchronousA ccum ula tor

(S W )

S ynchronousA ccum ula tor1

(H W )

C ost(S peed,A rea,P ow er)

E nergyE stim ate

(S W )

S ynchronousA ccum ula tor2

(H W )

C om parator(S W )

A synchronousA ccum ula tor

(S W )

C om parator(S W )

E nergyE stim ate

(H W )

C om paratorw ith

precom puta tion(H W )

A synchronousA ccum ula tor

(H W )

C om paratorw ith

precom puta tion(H W )

G O A L!

Page 53: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

53

Approach+ +

+ +

Y I2 YQ

2

>

>

+

>

RXI TXI RXQ TXQ RXI TXQ RXQ - TXI

max 값 선 택

θ 1 와 비 교

θ 2 와 비 교

동 기 누 적 단

비 동 기 누 적 단

에 너 지 계 산 단

O I = (RX I * TX I)

+ (RXQ * TXQ) O Q = (RX I * TXQ)

+ (RXQ * (- TX I))

Y I = ∑ O I Y Q = ∑ O Q

Z = max (Y I2 , Y Q

2)

∑ Z

Search Done !!

Yes

YesSearch_Slew No

No

C ontrol Signal G enerator

- Software oriented design- Dark block : Hardware- Interface : Control signal gen.- Partitioned in terms of speed cost

- Change from SW to HW 1. Implementation speed 2. Parallel architecture

Page 54: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

54

Resultcycle ratio Area(gates)

Full SW 266 -

Full HW - 9008Synchronous accumulator(1) 138 48.1 + 872

Computing energy(2) 265 4.4 + 3096(1) & (2) 137 48.5 + 3968

(2) &Comparator(3)

265 4.4 + 3155

(1) & (3) 138 48.1 + 931

Page 55: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

55

Flexibility vs. Energy-Efficiency

• Trade-off between efficiency and flexibility, programmable designs incur significant performance and power penalties compared to ASIC.•The parallel algorithm of signal processing can be achieved significant power savings by executing the dominant computational kernels of a given class of applications with common features on dedicated, optimized processing elements with minimum energy overhead.

Programmability requires generalized computation, storage, and communication system, which can be used to implement different kinds of algorithmsDomain specific processors preserve the flexibility of a general purpose programmable device to achieve higher levels of energy-efficiency, while maintaining the flexibility to handle a variety of algorithms

Page 56: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

56Hybrid Architecture Template (Pleiades) Arthur Abnous and Jan Rabaey

Pleiades does much better on the energy scale than the TI DSPs.Because DSPs are general-purpose, and instruction execution involves a great deal of overhead. Pleiades has the ability to create dedicated hardware structures tuned to the task at hand and executes operations with a small energy overhead

Page 57: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

57

Application Domains : ULTRA-LOW-POWER DOMAIN-SPECIFIC MULTIMEDIA PROCESSORS

CELP- Based Speech Coding LPC Analysis and Synthesis Codebook Search Lag ComputationDCT- Based Video Compression and Decompression DCT and Inverse- DCT Motion Estimation and Compensation Huffman Coding and Decoding Baseband Processing for Digital Radios Demodulation, Channel Equalization Timing Recovery, Error Correction

Page 58: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

58

The Re-configurable Terminal

Page 59: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

59

Satellite Processors

Page 60: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

60

Elements of Energy- Efficiency

Page 61: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

61

Multi-Processor Implementation

Page 62: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

62

Switching Activity Reduction(a) Average activity in a multiplier as a function of the constant value

(b) A parallel and serial implementations of an adder tree.

Page 63: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

63

Communication Network

Page 64: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

64

Distributed Data- Driven Control

Execution of a hardware module is triggered by the arrival of tokens. When there are no tokens to be processed at a given module, no switching activity occurs in that module.

Page 65: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

65

Implementation of Handshaking

Page 66: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

66

Design Methodology

Page 67: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

67

Low Power Circuit Techniques Reduced swing interconnect (communication network, memories,

programmable logic modules) On chip dc- dc conversion + multiple supply voltages Locally synchronous - globally asynchronous Automatic power- down Optimized libraries (0.6 m CMOS + Cadence/ Synopsys design flow)

Page 68: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

68

VSELP Synthesis Filter Mapped onto Satellite Processors

Page 69: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

69

Mappings of VSELP Kernel

The most energy efficient CELP-based speech algorithm - dissipates 36 mW ( Vdd = 1.8V, 0.5 um CMOS) - requires 23.4 MOPS

Proposed VSELP speech coder - 0.6 um CMOS - dissipates under 5 mW

Page 70: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

70

IIR Mapping

Page 71: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

71

IIR Comparison

Page 72: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

72

FFT Mapping

Page 73: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

73

FFT Comparison

Page 74: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

74

Reconguration for Power Savingin Real-Time Motion Estimation,S.R.Park,UMASS

Page 75: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

75

Motion Estimation

Page 76: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

76

Block Matching Algorithm

Page 77: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

77

Configurable H/W Paradigms

Page 78: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

78

Programmable Logic Modules

Page 79: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

79

Why Hardware for Motion Estimation? Most Computationally demanding part of Video Encoding Example: CCIR 601 format 720 by 576 pixel 16 by 16 macro block (n = 16) 32 by 32 search area (p = 8) 25 Hz Frame rate (f frame = 25) 9 Giga Operations/Sec is needed for Full Search Block Matching Algorithm.

Page 80: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

80

Why Reconguration in Motion Estimation?

Adjusting the search area at frame-rate according to the changing characteristics of video sequences

Reducing Power Consumption by avoiding unnecessary computation

Motion Vector Distributions

Page 81: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

81

Architecture for Motion EstimationFrom P. Pirsch et al, VLSI Architectures for Video Compression, Proc. Of IEEE, 1995

Page 82: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

82

Re-configurable Architecture for ME

Page 83: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

83

Power Estimation in Recongurable Architecture

Page 84: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

84

Power vs Search area

Page 85: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

85

Resource Reuse in FPGAs

Page 86: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

86

Motion Estimation

Page 87: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

87

Motion Estimation (low power)

P P P

P P P P

P P

a add abs

b add add abs

abs add

2 2

2

0 45

2

2 1

2

/

/

.

Therefore, power reduction

factor is 11%

Page 88: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

88

References[1] A. Abnous and J. Rabeay, “Ultra-Low-Power Domain-Specific Multimedia Processors”, Proceedings of the IEEE VLSI

Signal Processing Workshop, San Francisco, Oct 1996.

[2] Digital Semiconductor, Digital Semiconductor SA-110 Microprocessor Technical Reference Manual, Digital Equipment Corporation, 1996.

[3] TMS320C5x General-Purpose Application User’s Guides, Literatures Number SPRU164, TI, 1997.

[4] T. Anderson, The TMS320C2xx Sum-of-Products Methodology, Technical Application Report SPRA068, TI, 1996.

[5] M. Tsai, IIR Filter Design on the TMS320C54x DSP, Technical Application Report SPRA079, TI, 1996.

[6] Ftp://ftp.ti.com/pub/tms320bbs/c5xxfiles/54xffts.exe, ‘C54x Software Support Files, TI.

[7] C.Turner, Calculation of TMS320LS54x Power Dissipation, Technical Application Report SPRA164, TI, 1997.

[8] C.Turner, Calculation of TMS320LS54x Power Dissipation, Technical Application Report SPRA088, TI, 1996.

[9] E. Kusse, Personal communication, 1996.[10] J. Rabeay et al., “Fast Prototyping of Data Path Intensive Architecture”, IEEE Design & Test Magazine, Vol. 8, N0. 2,

pp. 40-51, 1991.[11] J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor”, IEEE Journal of Solid-State Circuit, Vol. 31,

N0. 11, pp. 1703-1714, Nov. 1996.[12] A. Fischman and P. Rowland, Designing Low-Power Applications with TMS320LC54x, Technical Application Report

SPRA281, TI, 1997.[13] Daniel D. Gajski, Nikil D. Dutt, Allen C-H Wu, Steve Y-L Lin, \High-level synthesis, Introduction to chip and system design," Kluwer

Academic publishers, 1992.

[14] Duncan A. Buell, Jerey M.Arnold, Walter J.Kleinfelde \Splash2, FPGAs in Custom Computing Machine," IEEE Computer Society Press, Los Alamitos, California.

[15] Jonathan Babb, Russell Tessier, Mathew Dahl, Silvina Zimi Hanono, David M. Hoki, and Anant Agarwal, Logic emulation with virtual wires," IEEE Transactions on Computer Aided Design of Integrated circuits and systems, vol. 16, No. 6, June 1997.

[16] M.Vasilko, Djamel Ait-Boudaoud, \Architectural synthesis techniques for dynamically Recongurable logic," Field Programmable Logic: Smart Applications, New Paradigms and Compilers, Proceedings of 6th Int. Workshop on Field Programmable Logic and Applications,FPL 96, Darmstadt, Germany, Sept. 23-25 1996.

Page 89: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

89

References[17] Patrick Lysaght, Gordon McGregor and Jonathan Stockwood, Conguration Controller Synthesis for Dynamically Recongurable

Systems," IEE Colloquium on Hardware Software COSynthesis for Recongurable systems, 1996.

[18] M.Vasilko, Djamel Ait-Boudaoud, Scheduling for dynamically Recongurable FPGAs," Proceedings of International workshop on Logic and Architecture synthesis, pp. 328-336, IFIPTC10 WG10.5, Dec. 18-19 1995.

[19] Doug Smith, Dinesh Bhatia, RACE: Recongurable and Adaptive Computing Environment,” Field Programmable Logic: Smart Applications, New Paradigms and Compilers, Proceedings of 6th Int. Workshop on Field Programmable Logic and Applications,FPL 96, Darmstadt, Germany, Sept. 23-25 1996. See http://www.ececs.uc.edu/ ~ dal.

[20] Xilinx Netlist Format (XNF) Specication, Version 6.1, June 1, 1995.

[21] Xilinx XABEL reference manual.

Page 90: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

90

DIGLOG multiplierC n n C n n

A A B B

A B A B B A A B

mult add

jR

kR

jR

kR

jR

kR R R

( ) , ( ) ,

,

( )( )

253 214

2 2

2 2 2 2

2 where n world length in bits

1st Iter 2nd Iter 3rd Iter

Worst-case error -25% -6% -1.6%

Prob. of Error<1% 10% 70% 99.8%

With an 8 by 8 multiplier, the exact result can be obtained at a maximum of seven iteration steps (worst case)

Page 91: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

91

Low Power CDMA Searcher Project 과제명 : IS-95 기반의 DS/CDMA 시스템 Co-design 기법을 이용한 저전력 설계

개발기간 : 1999.3.1 - 2000.2:28 (12 개월 ) 개발 목적 및 방법 : CDMA 단말기에 사용하기위한 MSM

(Mobile Station Modem) 칩의 탐색자 (Searcher Engine) 에 대한 RTL 수준 저전력 설계 구현 . 동작 주파수 : 12.5MHz

Data flow graph 를 사용하여 rescheduling, pre-computation 및 strength reduction, Synchronous Accumulator 를 이용한 저전력 설 , area 와 power

를 각각 최대 67.68%, 41.35% 감소 시킴 . H/W and S/W Co-design 기법 적용 San Kim and Jun-Dong Cho, “Low Power CDMA Searcher”, CAD and VLSI Workshop, May.

1999.

Inki Hwang, San Kim and Jun-Dong Cho, “CDMA Searcher Co-Design”, ASIC Workshop,

Sep. 1999.

Page 92: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

92

Voltage Scaling Merely changing a processor clock frequency is not an

effective technique for reducing energy consumption. Reducing the clock frequency will reduce the power consumed by a processor, however, it does not reduce the energy required to perform a given task.

Lowering the voltage along with the clock actually alters the energy-per-operation of the microprocessor, reducing the energy required to perform a fixed amount of work.

Page 93: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

93

OS: Voltage Scaling

Page 94: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

94

OS: Voltage Scheduling

Page 95: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

95

Multiple Supply VoltagesFilter Example

Page 96: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

96

Scale Supply Voltage with fCLK

Page 97: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

97

Adaptive Power Supply Voltages

Page 98: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

98

Different Voltage Schedules

0 5 10 15 20 25 Time(sec)

5.021000Mcycles50MHz

40J

(A)

0 5 10 15 20 25 Time(sec)

5.02750Mcycles50MHz

32.5J

(B)

0 5 10 15 20 25Time(sec)

5.02

1000Mcycles40MHz

25J (C)

Timing constraint

2.52

250Mcycles25MHz

4.02

En

ergy

con

sum

pti

on (

Vd

d2 )

Page 99: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

99

Data Driven Signal ProcessingThe basic idea of averaging two samples are buffered and their work loads are averaged.

The averaged workload is then used as the effective workload to drive the power supply.

Using a pingpong buffering scheme, data samples In +2, In +3

are being buffered while In, In +1

are being processed.

Page 100: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

100

Example of Buffering

Page 101: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

101

SOC CAD Companies Avant! www.avanticorp.com Cadence www.cadence.com Duet Tech www.duettech.com Escalade www.escalade.com Logic visions

www.logicvision.com Mentor Graphics

www.mentor.com Palmchip www.palmchip.com Sonic www.sonicsinc.com Summit Design www.summit-

design.com

Synopsys www.synopsys.com

Topdown design solutions www.topdown.com

Xynetix Design Systems www.xynetix.com

Zuken-Redac www.redac.co.uk

Page 102: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

102

Viterbi decoder project▶ 과제명 : Convolutional Encoder 를 위한 저전력 복호

알고리즘의 연구▶ 개발기간 : 1999.02.22 - 11:30 ( 약 9 개월 )▶ 개발 목적 및 방법 : IMT-2000 중에 포함되는 channel

coding 장치의 저전력화를 위한 독 자적인 기술의 연구 / 개발

▶ CODEC 주요사양 : - Code Rate : R = 1/2, 1/3, 1/4 , k=9 - Decoding 방법 : Trace-back Viterbi Decoder using Soft Decision

Page 103: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

103

Viterbi decoder project▶ 발표논문

1. Asia Pacific Conference on ASIC’99In this paper, we have presented the use of the consensus term and clocking control signal in ACSU for the low power Viterbi decoder. A 20% reduction in area and 30% reduction in power consumption are obtained based on the low power ACSU architecture[1]. Applying our proposed glitch reduction techniques to [1], the additional power consumption is reduced by 7% at a cost of 3% increase in area.

2. International Conference on VLSI and CAD’99 In this paper, we propose a new lower power algorithm on the trace-back unit of

systolic array Viterbi decoder[2]. Reusing the already-generated trace-back routes reduces the number of trace-back operations, and results in increasing the area of spurious switching activity region. Therefore, the switching activity during trace-back operation was further reduced with using gated-clocks. Our result showed on the average 40% reduction in power with the same latency, but 23% increase in area against the trace-back unit in [2]. We used Design Compiler of SYNOPSYS and measured power consumption using DesignPower of SYNOPSYS.

Page 104: Low Power System Level Design Methodologies Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. .

VLSI Algorithmic Design Automation Lab.

104

Viterbi decoder project▶ Reference

1. B C. Y. Tsui, R.S. K. Cheng and C. Ling, “Using Transformation to Reduce Power Consumption of IS-95 CDMA Receiver”, International Symposium on Low Power Electronics and Design, 1999

2. T. K. Truong, A. M. T. Shih, I. S. Reed, E. H.Satorius, “A VLSI Design for a Trace-back Viterbi Decoder”, IEEE Trans. Communication, vol. 40, no. 3, Mar. 1992.