Top Banner
Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey Zhujie Lin and Michael Liao
23

Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Structural and Temporal Control for Simultaneous Speed and Power Improvement

Applied on a 32x32 Dynamic Wallace Tree Multiplier

EE241 Prof. Jan RabaeyZhujie Lin and Michael Liao

Page 2: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Motivation

Faster Evaluation Lower Power Performance and Power

determined by the typical case, not the worst case

Page 3: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

The Leakage Issue There is a only

one large “resistor”

Leakage current increases with technology

Solution? Introduce more large “resistors”

Rp

Rn

Precharge Eval

Leakage Paths without Sleep Mode

Page 4: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Sleep Mode When dynamic

circuit is in sleep mode, there is extra large sleep “resistor”

Rp

R_sleep

Rn

R_sleep

Precharge Eval-Sleep

Leakage Paths with Sleep Mode

Page 5: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

The Utilization Issue

Unused parts of the multiplier still see clock Cost: CV2 in power The clock tree dissipates power

Solution: Turn on only active parts of the multiplier

Page 6: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Power Dissipation w/o Sleep Mode

blockevalevaleprecheprech FPPP )( argarg

DDeprecheprech VIP argarg

DDleakeval VICVP 2

1arg evaleprech

;

Page 7: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Power Dissipation w/Sleep Mode

widthblocksleepsleepevalevaleprecheprech FPPPP ')( argarg

DDeprecheprech VIP argarg

DDleakeval VICVP 2

DDsleepsleep VIP

1arg sleepevaleprech

clockFF /'

;

;

Page 8: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Sleep Mode

PDN

VDD

GND

0

0

PDN

VDD

GND

1

0

PDN

VDD

GND

1

1

Precharge Mode

Sleep ModeEvaluation

Mode

Page 9: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Sleep Mode and Pulsed Clock

PMOS CLK

NMOS CLK

Sleep Mode and the Use of the Pulsed Clock

Precharge

Precharge Sleep

Sleep

Evaluation Sleep Precharge

Precharge

Page 10: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Visualizing a Wallace Tree as Equal-delay Layers

AND Gates

Vector Add

Multiplier

Equal- Delay Layers

Page 11: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Pulsed-Leap Clock

Equal Delay Layers of Logic

CLK

Normal Domino Logic Clock

Page 12: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Pulsed-Leap Clock

Worst Case Operation for Pulsed Clock

Equal Delay Layers of Logic

NMOS CLK

PMOS CLK

External CLK

Page 13: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Pulsed-Leap Clock

External CLK

PMOS CLK

NMOS CLK

Typical Case of Pulsed-Leap Clock

Sleep

Sleep

SleepEqual Delay Layers of LogicSleep

SleepSleep

Page 14: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Additional Circuitry

MSB Detection Clock/Pulse Generation Leap Control

Page 15: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

MSB Detection

. . .

CLK

Page 16: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Clock/Pulse Generator

CLK_ENCLK NMOS CLK

Page 17: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Leap Control

MSB Detection

A B

AND Gates

Vector Add

Pulse Gen

Pulse Gen

Clk

Leap Clk

......

Page 18: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Design Choices

(a) (b)

Page 19: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Design Choices

(c) (d)

Page 20: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Results - Power

Energy Consumption

0.E+00

2.E-11

4.E-11

6.E-11

8.E-11

0 8 16 24 32

Input Bits

En

erg

y/c

yc

le(J

)

Benchmark Pulsed-Leap Clock

Page 21: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Results - Delay

Delay

0

500

1000

1500

2000

2500

0 8 16 24 32

Input Bits

De

lay

(p

s)

Benchmark Pulsed-Leap Clock

Page 22: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Results - Improvements

% Improvement over Benchmark

0.0

20.0

40.0

60.0

80.0

0 8 16 24 32

Input Bits

%Im

pro

ve

me

nt

Energy Performance

Page 23: Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey.

Application

FPGA, Multimedia Processors, ALUs Asynchronous Pipeline

XFIFO FIFO

Data In

Data Out