Top Banner
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK
20

MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS

Robert Mullins

Computer Architecture Group

Computer Laboratory

University of Cambridge, UK

Page 2: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

2/19

• Future performance gains will primarily come from increasing the number of IP cores in a system not their complexity or operating frequency

• Many reasons:– Diminishing returns from simply scaling what we have– Energy efficiency– Complexity – Fault tolerance– Economics

Communication-Centric Architectures

Page 3: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

3/19

On-Chip Networks

• An efficient general purpose chip-wide communication infrastructure is becoming essential

• One flexible networking option is to use packet-switched networks with support for virtual-channels

Page 4: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

4/19

The Lochside Router

• Router Architecture– Highly parameterised

implementation– Packet-switched network

with virtual-channel flow-control

– Best case latency is one cycle per network hop.

• Results presented here are from post P&R simulations targeting a 90nm technology

TILE

TrafficGenerator, Debug &

Test

R

Lochside Chip (2004/05) 180nm Technology

Page 5: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

5/19

Exploiting Speculation to Reduce Communication Latency

Peh/Dally (2001)

Page 6: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

6/19

Exploiting Speculation to Reduce Communication Latency

Page 7: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

7/19

• Apply existing power saving techniques to an on-chip network design– e.g. clock and signal gating, gate-level optimisations

etc.– Importance of applying such techniques before

making comparisons• Measure power consumption and provide an

accurate breakdown of where the remaining power is dissipated

• Where is best place to look for future power savings?

Aims of this work

Page 8: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

8/19

Measuring and Optimizing Dynamic Power

• Our Test Case– 8mm x 8mm die– 4x4 mesh network– Low-latency routers, best

case latency is one cycle per hop (incl. interconnect)

– 1.2V, 90nm technology– 4 input-buffers/ VC– 4 VC/ input port– 48 x 80-bit network links– 800MHz @ WC PVT

• ~32 FO4 clock period– Results reported at

250MHz

Page 9: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

9/19

Interconnect Delay/Energy Trade-offs

• Power dissipated in network links depends on how links are spaced and buffered

• At least a factor of 3 difference in energy consumption over range of potential interconnect options

• Could move to low-swing differential schemes for even greater energy savings

For results we assume min. spaced wires, opt. energy x delay product

Page 10: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

10/19

• Clock gating optimisations applied at two levels:– Local Clock Gating

• Automated clock gating within router• Some tuning of RTL involved to maximise

opportunities for synthesis tool

– Router Level Clock Gating• Exploit opportunities to gate clock as it enters the

router• Isolates router’s clock completely, only static

power consumption remains

Clock Gating

Page 11: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

11/19

• Clock gating exposes clock tree insertion delay• Need to know early if router will be required• Generate ‘early valid’ signals in neighbouring routers

– Early-valid signals are slightly pessimistic – Based on what is requested not granted

Router-Level Clock Gating

Page 12: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

12/19

• Automated signal gating and gate-level power optimisations had minimal impact

• Inserting signal gating logic manually did reduce input FIFO power requirements significantly

• The reported results could be further improved (by 12%) by enabling logic optimisation across module boundaries– This was restricted to accurately determine where

power is dissipated

Gate-Level Optimizations and Signal Gating

Page 13: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

13/19

• Simple power optimisations can quarter power requirements + many more opportunities to save power

• Network is ~5% of core area• Perhaps 10% of system power at present• Don’t make comparisons without optimizing power!

Power consumption of a single router and its links

Analysis of Power Consumption

Page 14: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

14/19

• 22% Static power, 11% Inter-Router Links• ~1% Global Clock tree• 65% Dynamic Power

– Power Breakdown• ~50% of dynamic power is consumed in local clock

tree and input FIFOs• ~30% on router datapath• ~20% on scheduling and arbitration

– Scheduling is probably more complex than typical implementations due to speculation

Analysis of Power Consumption

Page 15: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

15/19

Low-Power On-Chip Networks

• Interconnect and static power set to increase– Many low-power link technologies

• Low-swing differential techniques

– Power gating and other leakage reduction techniques

• Potential power savings begin to require lots of different techniques – no one silver bullet?

Page 16: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

16/19

Low-Power On-Chip Networks

• Topology– Don’t want to sacrifice general or at least multi-

purpose nature of our networked SoC– Results suggest higher radix routers and longer

interconnects could reduce power• Probably not a long term solution• Reduces path diversity, bad for fault-tolerance

• Architecture– Scope for minimising memory required to store

precomputed router schedule (particular to our router)– Simpler routers– Single cycle routers reduce power? Speculation for

low-power?

Page 17: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

17/19

Supporting Best-Effort (BE) and Guaranteed Services (GS) Efficiently

• Current timing of the datapath and link suggests additional GS data could be routed in the same clock cycle– Allocate datapath/link to GS traffic for first ½ of clock

cycle

• Double capacity of network – Exploit simpler GS circuit-switched routing when

possible– Reduce power

• Very little additional overhead

Page 18: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

18/19

• Network system timing issues are interesting– naturally event-driven not synchronous

• Work is investigating placing local data-driven clock generators in each network router– Clock is stretched when no data to be routed– Clock matches rate of incoming data streams – Robust synchronisation solution (true GALS)– Also investigating incorporating power gating support

• See also Distributed Clock Generator – DCG (Fairbanks/Moore)

Clocking On-Chip Networks

Page 19: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

19/19

Challenges and Future Work

• These are early results in a much more rigorous study on the power requirements of networked on-chip comummunication– Much more soon!

• Exploiting a general-purpose on-chip network– Exploiting execution diversity to improve energy-efficiency – Multi-use platforms and Virtual-IP – Fault tolerance– Networks of processing elements or networks that process?

• Scope for removing unnecessary interfaces and boundaries• Impact of networking on IP and processor core design

Page 20: MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.

Thank You