Top Banner
April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis – 308873140 Mentor: Ran Ginosar
38

April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 20081 / 38

Network on ChipAdvanced Topics in VLSI 1 - 049036

Asynchronous vs. Synchronous Design Techniques for NoCs

Presented by:

Alex Rekhelis – 308873140

Mentor:

Ran Ginosar

Page 2: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip2 / 38

Agenda

Introduction Synchronous On-Chip Networks

– Previous approaches survey

Asynchronous On-Chip Networks– Previous approaches survey

New approaches Conclusion

– Sync– Async

Bibliography

Page 3: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip3 / 38

Introduction

Highlights the wide range Sync / Async On chip networks

Contrast different approaches

Present new approaches

Page 4: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip4 / 38

Synchronous On-Chip Networks

Page 5: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip5 / 38

Generic On-Chip Router

Page 6: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip6 / 38

Synchronous Router Pipeline

Numerous stages of Router Pipeline – Raise communication latency– Can make packet buffers less effective– Incurs pipelining overheads

Page 7: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip7 / 38

Speculative Router Architecture

VC and switch allocation may be performed concurrently– Speculate that waiting packets will be successful in acquiring a VC– Prioritize non-speculative requests over speculative ones

Li-Shiuan Peh and William J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers”, In Proceedings HPCA’01, 2001.

Page 8: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip8 / 38

Single Cycle Speculative Router

R. D. Mullins, A. West and S. W. Moore, “Low-Latency Virtual-Channel Routers for On-Chip Networks”, In Proceedings ISCA’04.

Page 9: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip9 / 38

Single Cycle Speculative Router

Single cycle router made possible by use of speculation

Clock period is almost unchanged (compared to pipelined design)– Approx. 30 FO4 (simple standard-cell design)

Presence of clock simplifies design– Arbitration

Fast combinational matrix arbiters Can easily be extended to handle priority traffic etc.

– Speculation Aided by the clear notion of a clock “cycle” Simple abort logic (abort detection and actual abort)

Page 10: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip10 / 38

Synch architecture techniques (almost async)

Router clocks derived from a single source Locally Generated Clocks (periodic & free-running) Synchronous Routers with Asynchronous Links Locally Clocked Routers / Asynchronous Interconnect

(GALS style network)– Can support asynchronous interconnects

No longer exploiting periodic nature of router clocks Correct operation is independent of the delay of the link

– GALS interfaces with pausible clocks If necessary clock is stretched, data is always transferred reliably (value safe) Need to construct local delay line

Local aperiodic clock generation Data-Driven Local Clock

– Similarities to stoppable GALS interface and asynchronous priority arbiters

Page 11: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip11 / 38

GALS – Clock Pausing

Simple GALS interface (receiver) Note: Req/Ack uses 2-phase handshaking protocol

Page 12: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip12 / 38

Synchronous Routers - Summary

Can design high-performance single cycle routers Design is simplified by presence of global synchrony Distribution of global clock can be eased by

– New clock generation / distribution techniques– Source synchronous communication

Network operating frequency– Relax global synchrony further– Data-driven clocking determines most appropriate router clock

frequency automatically

Page 13: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip13 / 38

Asynchronous On-Chip Networks

Page 14: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip14 / 38

Why are asynchronous NoCs interesting?

No clock distribution, simple solution Networked IP blocks run at different clock frequencies

– No synchronization issues at interfaces Ability to exploit data / path-dependent delays

– Low-latency common or high-priority paths through router Freedom to optimize network links

– Not constrained by need to distribute/generate multiple clock frequencies. Can exploit high-frequency narrow links

– Dynamic latency/throughput trade-offs (adaptive pipeline depth)– Exploit dynamic optimizations on links (e.g. DVS)

Easy to use interfaces, modularity, Robust and simple implementation, Reduced design time

Some arguments for reduced power

Page 15: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip15 / 38

Asynchronous Circuit Basics

Control in asynchronous circuits often relies on simple handshaking protocols (req / ack event cycles)

Delay-insensitive event-driven system - every signal transition is acknowledged

The C-element is a fundamental building block of many asynchronous circuits

– Can be thought of as a AND-gate for events Arbitration

– Mutex– Tree arbiter element– Multiway arbiter– Static Priority Arbiters

Page 16: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip16 / 38

Asynchronous on-chip networks

How do we build more complex on-chip routers?– Support for virtual-channels– QoS

Challenges– Multi-way & prioritised arbitration– Control overheads

Arbitration and Delay Insensitive circuits can be slow! How can control overheads be hidden?

Page 17: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip17 / 38

Dynamic Voltage and Frequency Scaling Architecture for Units

Integration within a GALS NoC

E. Beigné , F. Clermidy, S. Miermont, and P. Vivet

NOCS 2008

Page 18: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip18 / 38

Outline of the paper

Dynamic and Static power consumption issues

NoC architecture for DVFS support

NoC Unit architecture

NoC Unit design

DVFS execution at system level

Power gain & physical implementation

Page 19: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip19 / 38

Dynamic and Static power consumption issues

Dynamic power consumption reduction– Reduce: switching activity, capacitances, supply voltage, frequency

Static power consumption reduction– Reduce: supply voltage, dominant leakage currents: ISTH, IGIDL, IGATE

– Multi VTH design Low power techniques must exists at all design levels from architecture to physical

implementation

Proposed a fully integrated solution to control locally dynamic and

Static power at Unit level within a GALS NoC

Page 20: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip20 / 38

NoC architecture for DVFS (1) – NoC architecture

A fully asynchronous Network-on-Chip IP units are synchronous islands using programmable Local Clock Generator Within the IP unit

– Synchronization is done thanks to Pausable Clock– A Power Unit manages internal Vcore generated using external Vhigh and

Vlow

– A Network Interface is in charge of NoC communications Local Power Management

Main CPU in charge of global power management

Local fine grain power management can be executed during IP computation and communication independently from the others

Page 21: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip21 / 38

NoC architecture for DVFS (2) – Main Principles

Each synchronous IP is an independent power and frequency domain A local fine grain Dynamic Voltage Scaling:

– Implementation of a local hardware controller to control transitions between Vhigh and Vlow

– Ensures smooth DVS transitions for IP safe computation A local fine grain Dynamic Frequency Scaling:

– Automatic frequency scaling– Use of clock generation re-programming to find the optimal V/F point of

operation Thanks to pausable clock technique, IP unit continues its operation during

DVFS phases

GALS architecture and local clock generation is a natural enabler for easy local DVFS

Page 22: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip22 / 38

NoC Unit architecture

Each IP core encapsulated with– Network Interface– Test Wrapper– Pausable Clock– Power Supply Unit

IP units have 5 supply modes – Init: reset at Vhigh (1.2V)– High: Vhigh supply– Low: Vlow supply (0.8V)– Hopping: switch Vhigh / Vlow for

DVFS– Idle: retention state at Vlow (no

clock) – Off: stand-by mode

Page 23: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip23 / 38

NoC Unit design (1) - Local Power Manager

Local Power Manager handles unit power modes A set of programmable registers, through the NoC Configuration of

– Programmable delay line– Power Supply Unit

Pulse Width modulator used to control the Hopping mode

Page 24: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip24 / 38

NoC Unit design (2) – Power Supply Unit

Power Supply Unit manages Vcore Two power switches Thigh and Tlow LVT transistors A Hopping Unit An Ultra Cut-Off Generator

Local Power Supply Unit offers a safe control of internal power supply depending on pre-defined power modes

Page 25: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip25 / 38

NoC Unit design (3) – Hopping Unit

Energy per operation scales with V²– Decrease Voltage (and Frequency) to be energy

efficient

«Triple state» power supply– Use of two PMOS power switches

Vhigh (1.2 V), Vlow (0.7 V), or OFF (0 V)

Switch between Vhigh and Vlow– Transitions take less than 100 ns– Mean speed / mean power of the IP is programmed by a

PWM

Compatible with synchronous and asynchronous IPs– For GALS system: coordination done with local clock

generator

Can easily be integrated in any CMOS circuit– No inductor contrary to traditional DC/DC converters– No capacitor contrary to charge pump implementation

Page 26: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip26 / 38

NoC Unit design (4) – Ultra Cut-Off Generator

When reverse polarizing the gate, the leakage current goes through a minimum

The optimal polarization point varies with the temperature, the supply voltage and the process corners

The proposed UCO generator automatically polarizes the gate of the Power switch to its point of minimum leakage

Compensates for temperature variation, alleviates corners variations.

The gate oxide reliability is considered by introducing a passive stress reduction mechanism

Page 27: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip27 / 38

NoC Unit design (5) – Pausable Clock Interface

Pause temporary the clock when a transfer (NoC) or a supply switch is required

Based on– Two GALS ports : Synchronous-to Asynchronous and

Asynchronous-to-Synchronous– A programmable delay line– A pausable clock generator

Pausable Clock Generator arbitrates pause requests

Page 28: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip28 / 38

NoC Unit design (5) – Pausable Clock Interface

Programmable delay line – Precise, small and low power– Using Standard cells– On the same unit power domain

Pausable Clock Interface allows an efficient synchronization and a safe dynamic voltage and frequency scaling with minimal

latency cost

Page 29: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip29 / 38

Power gain

Programmable delay line matches with unit logic on the same power domain

– Compensates any mismatch thanks to re-programmation Power reduction

– Vhigh=1.2V and Vlow=0.8V– 35 % dynamic power reduction between High and Low modes– Hopping mode is used to save power without any latency cost– Leakage power thanks to UCO is reduced by 2 decade

Power Supply Unit efficiency– Hopping Unit

Only resistive losses in the power transistors About 1 mW dynamic power

– => more than 95 % power efficiency– 90 % total efficiency (external DC-DC taken into account)

An adaptive and reliable Power Supply Unit giving high power reduction factor and high power efficiency

Page 30: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip30 / 38

Physical Implementation

Power Switch– One single Power-Switch for the complete power domain– Sized to get a speed loss<5%– Area : about <5% of the power domain

Hopping Unit– Area : 140μm*35μm– Hopping Transition : <100 ns

Page 31: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip31 / 38

What we see in recent work?

A fully integrated DVFS architecture within a GALS NoC

We are able to handle leakage problems due to technology scaling– insertion of power switches

Dynamic power reduction is possible through voltage scaling:– Hopping– Management of multi power domains in a complex SoC

The knowledge of GALS systems lead to automatic frequency scaling

– Pausable clock interfaces– Asynchronous implementation

Page 32: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip32 / 38

Synchronous

or

Asynchronous

NoCs

?

Page 33: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip33 / 38

Comparing Approaches

Published works on asynchronous routers and networks are not enough, lack of detailed explanation, simulation are difficult and almost impossible

Single latency / throughput figures don’t tell whole story

Detailed comparative studies with real traffic are required

Often difficult to isolate impact of choice of system timing style, many things tend to be different:

– Technology, circuit style, architecture

Problems in comparing synchronous and asynchronous designs

Page 34: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip34 / 38

Questions about Asynchronous design?

Testing asynchronous circuits– An asynchronous circuit replaces the clock with a large number of

distributed state holding elements– Large area overhead associated with test– Testing of non-deterministic elements (MUTEX)

Performance– ““Asynchronous circuits avoid issues of timing closure, they are correct-

by-construction” – But performance guarantees are still required. Slow synchronous circuits are easy to build!

– Value safe versus time safe– Predicting performance is complex

Perhaps on-chip communication is an application where such characteristics can be tolerated?

Page 35: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip35 / 38

Synchronous or Asynchronous?

A clock less on-chip network appears to be an elegant solution although some questions remain:

– Test– Performance concerns

Shouldn’t asynchronous designs offer latency advantages?– Fast local control, path/data dependent delays, DI interconnects

Perhaps asynchronous routers mimic synchronous architectures too closely?

– Exploit flexibility, novel architectures, different topologies Overheads for data-driven clocking or GALS currently look small in

comparison Synchronous design has advantages too

– Predictability and determinism can be exploited Fast single cycle routers possible

– Global snapshot of state is good for scheduling Still lots of interesting research to be done

– Need more data points

Page 36: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip36 / 38

Conclusions

High cost associated with both global synchrony and delay-insensitive circuits

– Can relax constraints in both directions

Which techniques achieve the best cost/benefit mix for on-chip networks?

– Data-driven clocks look promising

?

SYNCHRONOUS

ASYNCHRONOUS

Page 37: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip37 / 38

Bibliography

R. Mullins, “Asynchronous versus synchronous design techniques for NoCs,” Invited lecture given at SoC 2005, part of a tutorial entitled “The status of the NoC revolution: design methods, architectures and silicon implementation,” Tampere, Finland, November 2005.

E. Beigné , F. Clermidy, S. Miermont, and P. Vivet,

“Dynamic Voltage and Frequency Scaling Architecture for Units Integration within a GALS NoC,” NOCS 2008, Apr. 2008.

E. Beigne and P. Vivet, “Design of On-chip and Off-chip Interfaces for a GALS NoC Architecture”, Proceedings of IEEE International Symposium on Advanced Research in Asynchronous Circuits and Systems, ASYNC'2006, Grenoble, France, pp. 172-181, March 2006.

Page 38: April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.

April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip38 / 38

Thank you