Top Banner
Jan M. Rabaey ow Power Design Essentials ©2008 Chapter 6 Optimizing Power @ Design Time Interconnect and Clocks
52

Optimizing Power @ Design Time Interconnect and Clocks

Feb 26, 2016

Download

Documents

kaida

Optimizing Power @ Design Time Interconnect and Clocks. Chapter Outline. Trends and bounds An OSI approach to interconnect optimization Physical layer Data link and MAC Network Application Clock distribution. ITRS Projections. [Source: ITRS Roadmap, 2004, 2005]. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Optimizing Power @ Design Time Interconnect and Clocks

Jan M. Rabaey

Low Power Design Essentials ©2008 Chapter 6

Optimizing Power @ Design Time

Interconnect and Clocks

Page 2: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.2

Chapter Outline

Trends and bounds An OSI approach to interconnect optimization

– Physical layer– Data link and MAC– Network– Application

Clock distribution

Page 3: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.3

ITRS Projections

Calendar Year 2012 2018 2020

Interconnect One Half Pitch 35 nm

18 nm 14 nm

MOSFET Physical Gate Length 14 nm

7 nm 6 nm

Number of Interconnect Levels 12-1614-18

14-18

On-Chip Local Clock 20 GHz

53 GHz 73 GHz

Chip-to-Board Clock 15 GHz

56 GHz 89 GHz

# of Hi Perf. ASIC Signal I/O Pads 2500 3100 3100

# of Hi Perf. ASIC Power/Ground Pads 2500 3100 3100

Supply Voltage 0.7-0.9 V 0.5-0.7 V 0.5-0.7 V

Supply Current 283-220 A

396-283 A 396-283 A

[Source: ITRS Roadmap, 2004, 2005]

Page 4: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.4

Increasing Impact of Interconnect

Interconnect is now exceeding transistors in– Latency– Power dissipation– Manufacturing complexity

Direct consequence of scaling

Page 5: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.5

Communication Dominant Part of Power Budget

65%21%

9%5% Interconnect

Clock

I/OCLB

FPGAmProcessor

Signal processor

Clock

LogicMemory

I/O

ASSP

ClocksCaches

Execution Units

Control I/O Drivers

40%20%

15%

15% 10%

Page 6: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.6

Idealized Wire Scaling Model

Parameter Relation Local Wire Constant Length Global Wire

W, H, t 1/S 1/S 1/S

L 1/S 1 1/SC

C LW/t 1/S 1 1/SC

R L/WH S S2 S2/SC

tp ~ CR L2/Ht 1 S2 S2/SC2

E CV2 1/SU2 1/U2 1/(SCU2)

Page 7: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.7

Distribution of Wire Lengths on Chip

[Ref: J. Davis, C&S’98]

© IEEE 1998

Page 8: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.8

Technology Innovations

Reduce dielectric permittivity (e.g. Aerogels or air)

Reduce resistivity (e.g. Copper)

Reduce wirelengths through 3D-integration

Novel interconnect media (carbon nanotubes, optical)

(Pictures courtesy of IBM and IFC FCRP)

© IEEE 1998

Page 9: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.9

Logic Scaling

10-12 10-9 10-6 10-3 100

Ptp~1/S3

100

10 -3

10 -6

10 -9

10 -12

10 -15

Pow

er [W

], P

Delay [s], tp

10-6J

10-9J

10-12J

10-15J

10-18J

[Ref: J. Davis, Proc’01]

Page 10: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.10

Interconnect Scaling

Delay [s], t

(Len

gth)

-2 [c

m-2],

L-2

(Len

gth)

[cm

], L

10-18

10-15

10-12

10-9

10-6

10-3

L-2 t = 10-5 [s/cm-2](F = 0.1µ)

L-2 t ~ S2

1010

10 8

10 6

10 4

102

10 0

10 -2

10 -4

10-5

10-4

10 -3

10 -2

10-1

10 -0

10

102

10-13(1000µ)

10-11(100µ)

10-9(10µ)

10-7(1µ)

[Ref: J. Davis, Proc’01]

Page 11: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.11

Lower Bounds on Interconnect Energy

Claude Shannon

)1(2logkTBPBC S

C: capacity in bits/secB: bandwidthPs: average signal power

CPE Sbit /

Valid for an “infinitely long” bit transition (C/B→0)Equals 4.10-21J/bit at room temperature

)2ln()0/((min) kTBCEE bitbit

Shannon’s theorem on maximum capacity of communication channel

[Ref: J. Davis, Proc’01]

Page 12: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.12

Reducing Interconnect Power/Energy

Same philosophy as with logic: reduce capacitance, voltage (or voltage swing) and/or activity

A major difference: sending a bit(s) from one point to another is fundamentally a communications /networking problem, and it helps to consider it as such.

Abstraction layers are different:– For computation: device, gate, logic, micro-architecture– For communication: wire, link, network, transport

Helps to organize along abstraction layers, well understood in the networking world: the OSI protocol stack

Page 13: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.13

OSI Protocol Stack

Reference model for wired and wireless protocol design — Also useful guide for conception and optimization of on-chip communication

Layered approach allows for orthogonalization of concerns and decomposition of constraints

Network

Transport

Session

Data Link

Physical

Presentation/Application

No requirement to implement all layers of the stack Layered structure must not necessarily be maintained in

final implementation[Ref: M. Sgroi, DAC’01]

Page 14: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.14

The Physical Layer

Transmit bits over physical interconnect medium (wire) Physical medium

– Material choice, repeater insertion

Signal waveform– Discrete levels, pulses,

modulated sinusoids Voltages

– Reduced swing Timing, synchronization

Network

Transport

Session

Data Link

Physical

Presentation/Application

So far, on-chip communication almost uniquely “level-based”

Page 15: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.15

Repeater Insertion

Optimal receiver insertion results in wire delay linear with L

))(( wwddp crCRLt with RdCd and rwcw intrinsic delays of inverter and wire, respectively

But: At major energy cost!

Page 16: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.16

Repeater Insertion ─ Example

1 cm Cu wire in 90 nm technology (on intermediate layers)– rw = 250 W/mm; cw = 200 fF/mm– tp = 0.69rwcwL2 = 3.45 nsec

Optimal driver insertion:– tpopt = 0.5 nsec– Requires insertion of 13 repeaters– Energy per transition 8 times larger than just charging

the wire (6 pJ verus 0.75 pJ)! It pays to back off!

Page 17: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.17

Wire Energy-Delay Trade-off

1 2 3 4 5 6 7 80.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

dNorm

eNor

m

wire energy only

L = 1cm (Cu)90 nm CMOS

(dMin, eMax)R

epea

ter o

verh

ead

Page 18: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.18

Multi-dimensional Optimization

Design parameters:Voltage, number of stages, buffer sizes

Voltage scaling has largest impact, followed by selection of number of repeaters

Transistor sizing secondary.

1 2 3 4 5 6 7 80

2

4

6

8

10

12

dNorm

Num

ber o

f sta

ges

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

VD

D (V

)

Page 19: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.19

Reduced Swing

Ebit = CVDDVswing

Concerns:– Overhead (area, delay)– Robustness (supply noise, crosstalk, process variations)– Repeaters?

Transmitter (TX) Receiver (RX)

Page 20: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.20

Traditional Level Converter

Requires two discrete voltage levels Asynchronous level conversion adds extra

delay

VDDH VDDL

VDDH VDDH

in

CL

OUT OUTVDDL

[Ref: H. Zhang, TVLSI’00]

Page 21: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.21

Avoiding Extra References

[Ref: H. Zhang, VLSI’00]

in

VDD

VDD VDD

in2

CL

outN3

P3

N1

P1

N2

P2

VTC

Transient

Page 22: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.22

Differential (Clocked) Signaling

Allows for very low swings (200 mV) Robust Quadratic energy savings But: doubling the wiring, extra clock signal, complexity

[Ref: T. Burd, UCB’01]

in

REF VDD

REF

CL

CL

clk

clk clk

d_b d

out_b

out

Page 23: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.23

Lower Bound on Signal Swing? Reduction of signal swing translates into higher power dissipation

in receiver – trade-off between wire and receiver energy dissipation Reduced SNR impacts reliability – current on-chip interconnect

strategies require Bit Error Rate (BER) of zero (in contrast to communication and network links)– Noise source: power supply noise, crosstalk

Swings as low as 200 mV have been reported [Ref: Burd’00], 100 mV definitely possible

Further reduction requires crosstalk suppression

shielding folding

GND

GND

GND

Page 24: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.24

Quasi-Adiabatic Charging

t

V VDD

VDD/ N

[Ref: L. Svensson, ISLPED’96]

• Uses stepwise approximation of adiabatic (dis)charging

• Capacitors acting as “charge reservoir”

• Energy drawn from supply reduced by factor N

CT1

CT2

CTN-1

Page 25: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.25

Charge Redistribution Schemes

VDD/2

VDD/4

3VDD/4

Precharge Eval Precharge

B0

B0

B1

B1

B0 = 0

B1 = 1

VDD

E

E

E

P

P

GND

RX1

RX0

1

0

B1

B1

B0

B0

Charge recycled from top to bottom Precharge phase equalizes differential lines Energy/bit = 2C(VDD/N)2

Challenges: Receiver design, noise margins

[Ref: H. Yamauchi, JSSC’95]

Page 26: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.26

Alternative Communication Schemes

Example: Capacitively-driven wires

Offers some compelling advantages Reduced swing

Swing is VDD/(n+1) without extra supply Reduced loadAllows for smaller driver Reduced delayCapacitor pre-emphasizes edges Pitchfork capacitors exploit

sidewall capacitance

[Ref: D. Hopkins, ISSCC’07]

Page 27: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.27

Signaling Protocols

Network

ProcessorModule

(mProc, ALU, MPY, SRAM…)

din reqin ackindout reqout ackout

Din

REQin

done

Globally Asynchronousself-timed handshaking protocol

Allows individual modulesto dynamically trade-off performancefor energy-efficiency

Page 28: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.28

Signaling Protocols

Network

Physical Layer Interface Module

ProcessorModule

(mProc, ALU, MPY, SRAM…)

din reqin ackindout reqout ackout

din dout clk

Din

REQin

Clk

done

Locally synchronous

done

Globally Asynchronous

Page 29: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.29

The Data Link /Media Access Layer

Reliable transmission over physical link and sharing interconnect medium between multiple sources and destinations (MAC) Bundling, serialization, packetizing Error detection and correction Coding Multiple-access schemes

Network

Transport

Session

Data Link

Physical

Presentation/Application

Page 30: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.30

Coding

Enc

oder

Dec

oderN N + k N

LinkTX RX

Adding redundancy to communication link (extra bits) to: Reduce transitions (activity encoding) Reduce energy/bit (error-correcting coding)

Page 31: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.31

Activity Reduction Through Coding

[Ref: M. Stan, TVLSI’95]

Enc

oder

Dec

oderN

N + 1N

Example: Bus-Invert Coding

Invert bit p

Data word D inverted if Hamming distance from previous is larger than N/2.

DDenc

D

D # T Denc p #T0010101000111011110101000000110101110110…

-2756

0010101000111011001010110000110110001001…

00101

-21+13+12+1

Page 32: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.32

Bus-Invert Coding

Gain: 25 % (at best – for random data)Overhead: Extra wire (and activity) Encoder, decoder Not effective for correlated data

Reg

LP

Encode

Decode

D DDenc

p

[Ref: M. Stan, TVLSI’95]

Bus

Page 33: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.33

Other Transition Coding Schemes

Advanced bus-invert coding (e.g. partition bus into sub-components) (e.g. [M.Stan, TVLSI’97])

Coding for address busses ( which often display sequentiality) (e.g. [L. Benini, DATE’98])

Full-fledged channel coding, borrowed from communication links (e.g. [S. Ramprasad, TVLSI’99])

Coding to reduce impact of Miller capacitance between neighboring wires [Ref: Sotiriadis, ASPDAC’01]

Maximum capacitance transition – can be avoided by coding

bit k-1 bit k bit k+1 Delay factor g

h h h 1

h h − 1 + r

h h i 1 + 2r

− h − 1 + 2r

− h i 1 + 3r

i h i 1 + 4r

Page 34: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.34

Error-Correcting Codes

Enc

oder

Dec

oderN

N + kN

DDenc

D

with

e.g.

1

1

0

= 3

Example: (4,3,1) Hamming Code

B3

wrong Adding redundancy allows

for more aggressive scaling of signal swings and/or timing

Simpler codes such as Hamming prove most effective

P1P2B3P4B5B6B7

P1 + B3 + B5 + B7 = 0

P4 + B5 + B5 + B7 = 0

P2 + B3 + B6 + B7 = 0

Page 35: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.35

Media Access Sharing of physical media over multiple data streams

increases capacitance and activity (see Chapter 5), but reduces area

Many multi-access schemes known from communications – Time domain:Time-Division Multiple Access (TDMA)– Frequency domain: narrow band, code division multiplexing

Buses based on Arbitration-based TDMA most common in today’s ICs

Page 36: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.36

Bus Protocols and Energy

Some Lessons from the Communications world:– When utilization is low, simple schemes are more effective – When traffic is intense, reservation of resources minimizes

overhead and latency (collisions, resends) Combining the two leads to energy efficiency Example : SiliconBackplane MicroNetwork

CurrentSlot

[Courtesy: Sonics, Inc]

Independent arbitration for every cycle includes two phases:- Distributed TDMA for guaranteed latency/bandwidth- Round robin for random access

Arbitration

Command

Page 37: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.37

The Network Layer

Topology-independent end-to-end communication over multiple data links (routing, bridging, repeaters) Topology Static versus dynamic

configuration / routing

Physical

Transport

Session

Data Link

Network

Presentation/Application

Becoming more important in today’s complex multi-processor designs“The Network-on-a-Chip (NOC)”

[Ref: G. De Micheli, Morgan-Kaufman’06]

Page 38: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.38

Network-on-a-Chip (NoC)

Dedicated networks with reserved links preferable for high traffic channels – but: limited connectivity, area overhead

Flexibility an increasing requirement in multi (many) –core chip implementations

or

Page 39: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.39

The Network Trade-off’sInterconnect-oriented architecture trades off flexibility, latency, energy and area-efficiency through the following concepts Locality - eliminate global structures Hierarchy - expose locality in communication requirements Concurrency/MultiplexingVery Similar to Architectural Space Trade-off’s

Dedicated wiring

Proc

LocalLogic

Router

NetworkWires

Network-on-a-Chip

[Courtesy: B. Dally, Stanford]

Page 40: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.40

Networking Topology

Homogeneous– Crossbar, Butterfly, Torus, Mesh,Tree, …

Heterogeneous– Hierarchy

Mesh (FPGA)

Tree

Crossbar

Page 41: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.41

Network Topology Exploration

Manhattan Distance

Ene

rgy

x D

elay

Mesh

Binary Tree

Manhattan Distance

Ene

rgy

x D

elay

Mesh

Binary Tree

Mesh + Inverse

Short connections in tree are redundant

Inverse clustering complements mesh

[Ref: V. George, Springer’01]

Page 42: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.42

Circuit-Switched versus Packet Based

On-Chip Reality: Wires (bandwidth) are relatively cheap, buffering and routing expensive

Packet-switched approach versatile– Preferred approach in large networks– But … routers come with large overhead– Case study Intel: 18% of power in link, 82%

in router Circuit-switched approach attractive for

high-data rate quasi-static links Hierarchical combination often preferred

choice

Bus

C C

C C

Bus to connect over short distances

Hierarchical circuit and packet switched networks for longer connections

Bus

C C

C CBus

C C

C C

Bus

C C

C CBus

C C

C C

R R

R R

Page 43: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.43

Example: The Pleiades Network-on-a-Chip

Configuration Bus• Configurable platform for

low-energy communication and signal-processing applications (See Chapter 5)

• Allows for dynamic task-level reconfiguration of process network

Energy-efficient flexible network essential to the concept

Configurable Interconnect

ArithmeticModule

ArithmeticModule

ArithmeticModule

ConfigurableLogic

ConfigurableLogicmP

Configuration

DedicatedArithmetic

Network Interface

[Ref: H. Zhang, JSSC’00]

Page 44: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.44

Pleiades Network Layer

Universal Switchbox

Cluster

Cluster

Level-1 Mesh Level-2 Mesh

Hierarchical Switchbox

• Network statically configured at start of session and ripped up at end• Structured approach reduces interconnect energy with factor 7 over straightforward cross-bar

Hierarchical reconfigurable mesh network

Page 45: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.45

Top Layers of the OSI Stack

Abstracts communication architecture to system and performs data formatting and conversion

Establishes and maintains end-to-end communications – flow control, message

reordering, packet segmentation and reassembly Physical

Transport

Session

Data Link

Presentation/Application

Network

Example: Establish, maintain and rip-up connections in dynamically reconfigurable Systems-on-a-Chip – Important in power-management

Page 46: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.46

What About Clock Distribution?

Clock easily the most energy-consuming signal of a chip– Largest length– Largest fanout– Most activity (a = 1)

Skew control adding major overhead– Intermediate clock repeaters– De-skewing elements

Opportunities– Reduced swing– Alternative clock distribution schemes– Avoiding a global clock altogether

Page 47: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.47

Reduced-Swing Clock Distribution

Similar to reduced-swing interconnect Relatively easy to implement But: Extra-delay in flip-flop’s adds directly to clock period

Example: half-swing clock distribution scheme

Regular 2-phase clock

Half-swing clock

VDD

GND

VDD

GND

NMOS clock

PMOS clock

NMOS clock

PMOS clock

[Ref: H. Kojima, JSSC’95]

© IEEE 1995

Page 48: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.48

Alternative Clock Distribution Schemes

Canceling skew in perfect transmission line scenario

Example: Transmission-Line Based Clock Distribution

[Ref: V. Prodanov, CICC’06]

© IEEE 2006

Page 49: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.49

Summary

Interconnect important component of overall power dissipation

Structured approach with exploration at different abstraction layers most effective

Lot to be learned from communications and networking community – yet, techniques must be applied judiciously – Cost relationship between active and passive

components different Some exciting possibilities for the future: 3D-

integration, novel interconnect materials, optical or wireless I/O

Page 50: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.50

Books and Book Chapters T. Burd, “Energy-Efficient Processor System Design,”

http://bwrc.eecs.berkeley.edu/Publications/2001/THESES/energ_eff_process-sys_des/index.htm, UCB, 2001.

G. De Micheli and L. Benini, “Networks on Chips: Technology and Tools,” Morgan-Kaufman, 2006. V. George and J. Rabaey, “Low-energy FPGAs: Architecture and Design”, Springer 2001. J. Rabaey, A. Chandrakasan, B. Nikolic, “Digital Integrated Circuits: A Design Perspective,” 2nd ed,

Prentice Hall 2003. C. Svensson, “Low-Power and Low-Voltage Communication for SoC’s,” in C. Piguet, Low-Power

Electronics Design, Ch. 14, CRC Press, 2005. L. Svensson, “Adiabatic and Clock-Powered Circuits,” in C. Piguet, Low-Power Electronics Design,

Ch. 15, CRC Press, 2005. G. Yeap, “Special Techniques”, in Practical Low Power Digital VLSI Design, Ch 6., Kluwer

Academic Publishers, 1998.

Articles L. Benini et al, “Address bus encoding techniques for system-level power optimization,”

Proceedings DATE’98, pp. 861-867, Paris, February 1998 T. Burd et al., “A Dynamic Voltage Scaled Microprocessor System,” IEEE ISSCC Digest of Technical

Papers, pp. 294-295, Feb. 2000. M. Chang et al, “CMP Network-on-Chop Overlaid with Multi-Band RF Interconnect”, International

Symposium on High-Performance Computer Architecture, Febr. 2008. D.M. Chapiro, “Globally Asynchronous Locally Synchronous Systems,” PhD thesis, Stanford

University, 1984.

References

Page 51: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.51

W. Dally, “Route Packets, Not Wires: On-Chip Interconnect Networks,” Proceedings DAC 2001, pp. 684-689, Las Vegas, June 2001.

J. Davis and J. Meindl, “Is Interconnect the Weak Link?,” IEEE Circuits and Systems Magazine, pp. 30-36, March 1998.

J. Davis et al., “Interconnect Limits on Gigascale Integration (GSI) in the 21st Century,” Proceedings of the IEEE, Vol. 89, No. 3, pp. 305-324, March 2001.

D. Hopkins et al, "Circuit techniques to enable 430Gb/s/mm2 proximity communication," IEEE International Solid-State Circuits Conference, vol. XL, pp. 368 - 369, February 2007.

H. Kojima et al., “Half-Swing Clocking Scheme for 75% Power Saving in Clocking Circuitry,” Journal of Solid Stated Circuits, vol. 30, no 4, pp. 432-435, April 1995.

E. Kusse and J. Rabaey, “Low-energy embedded FPGA structures,” Proceedings ISLPED’98, pp.155-160, Monterey, Aug. 1998.

V. Prodanov and M. Banu, “GHz Serial Passive Clock Distribution in VLSI using Bidirectional Signaling,” Proceedings CICC 06.

S. Ramprasad et al., “A coding framework for low-power address and data busses,” IEEE Transactions on VLSI Signal Processing, Vol. 7, No 2, pp. 212-221, June 1999.

M. Sgroi et al, “Addressing the System-on-a-Chip Woes Through Communication-Based Design,” Proceedings DAC 2001, pp. 678-683, Las Vegas, June 2001.

P. Sotiriadis and A. Chandrakasan, “Reducing Bus Delay in Submicron Technology Using Coding,” Proceedings ASPDAC Conference, Yokohama, January 2001.

References (cntd)

Page 52: Optimizing Power @ Design Time Interconnect and Clocks

Low Power Design Essentials ©2008 6.52

References (cntd)

M. Stan and W. Burleson, “Bus-Invert Coding for Low-Power I/O,” IEEE Transactions on VLSI, pp. 48-58, March 1995.

M.. Stan, W. Burleson, "Low-Power Encodings for Global Communication in CMOS VLSI", IEEE Transactions on VLSI Systems, pp. 444-455, Dec. 1997.

V. Sathe, J.-Y. Chueh, and M. C. Papaefthymiou, “Energy-Efficient GHz-Class Charg-Recovery logic”, IEEE JSSC vol. 42 No 1, pp.38-47, January 2007.

L. Svensson et al., “A sub-CV2 pad Driver with 10 ns Transition Time,” Proc. ISLPED 96, Monterey, Aug. 12-14, 1996.

D. Wingard, “Micronetwork-Based Integration for SOCs,” Proceedings DAC 01, pp. pp. 673-677, Las Vegas, June 2001.

H. Yamauchi et al., “An Asymptotically Zero Power Charge Recycling Bus,” IEEE Journal of Solid Stated Circuits, vol. 30, no 4, pp. 423-431, April 1995.

H. Zhang, V. George and J. Rabaey, “Low-Swing on-chip Signaling Techniques: Effectiveness and Robustness,” IEEE Transactions on VLSI Systems, Vol. 8, No 3, pp. 264-272, June 2000.

H. Zhang et al, “A 1V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications,” IEEE Journal of Solid-State Circuits, vol. 35, no. 11, pp. 1697-1704, Nov. 2000.