April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.
Post on 20-Dec-2015
215 Views
Preview:
Transcript
April, 20081 / 38
Network on ChipAdvanced Topics in VLSI 1 - 049036
Asynchronous vs. Synchronous Design Techniques for NoCs
Presented by:
Alex Rekhelis – 308873140
Mentor:
Ran Ginosar
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip2 / 38
Agenda
Introduction Synchronous On-Chip Networks
– Previous approaches survey
Asynchronous On-Chip Networks– Previous approaches survey
New approaches Conclusion
– Sync– Async
Bibliography
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip3 / 38
Introduction
Highlights the wide range Sync / Async On chip networks
Contrast different approaches
Present new approaches
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip6 / 38
Synchronous Router Pipeline
Numerous stages of Router Pipeline – Raise communication latency– Can make packet buffers less effective– Incurs pipelining overheads
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip7 / 38
Speculative Router Architecture
VC and switch allocation may be performed concurrently– Speculate that waiting packets will be successful in acquiring a VC– Prioritize non-speculative requests over speculative ones
Li-Shiuan Peh and William J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers”, In Proceedings HPCA’01, 2001.
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip8 / 38
Single Cycle Speculative Router
R. D. Mullins, A. West and S. W. Moore, “Low-Latency Virtual-Channel Routers for On-Chip Networks”, In Proceedings ISCA’04.
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip9 / 38
Single Cycle Speculative Router
Single cycle router made possible by use of speculation
Clock period is almost unchanged (compared to pipelined design)– Approx. 30 FO4 (simple standard-cell design)
Presence of clock simplifies design– Arbitration
Fast combinational matrix arbiters Can easily be extended to handle priority traffic etc.
– Speculation Aided by the clear notion of a clock “cycle” Simple abort logic (abort detection and actual abort)
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip10 / 38
Synch architecture techniques (almost async)
Router clocks derived from a single source Locally Generated Clocks (periodic & free-running) Synchronous Routers with Asynchronous Links Locally Clocked Routers / Asynchronous Interconnect
(GALS style network)– Can support asynchronous interconnects
No longer exploiting periodic nature of router clocks Correct operation is independent of the delay of the link
– GALS interfaces with pausible clocks If necessary clock is stretched, data is always transferred reliably (value safe) Need to construct local delay line
Local aperiodic clock generation Data-Driven Local Clock
– Similarities to stoppable GALS interface and asynchronous priority arbiters
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip11 / 38
GALS – Clock Pausing
Simple GALS interface (receiver) Note: Req/Ack uses 2-phase handshaking protocol
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip12 / 38
Synchronous Routers - Summary
Can design high-performance single cycle routers Design is simplified by presence of global synchrony Distribution of global clock can be eased by
– New clock generation / distribution techniques– Source synchronous communication
Network operating frequency– Relax global synchrony further– Data-driven clocking determines most appropriate router clock
frequency automatically
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip14 / 38
Why are asynchronous NoCs interesting?
No clock distribution, simple solution Networked IP blocks run at different clock frequencies
– No synchronization issues at interfaces Ability to exploit data / path-dependent delays
– Low-latency common or high-priority paths through router Freedom to optimize network links
– Not constrained by need to distribute/generate multiple clock frequencies. Can exploit high-frequency narrow links
– Dynamic latency/throughput trade-offs (adaptive pipeline depth)– Exploit dynamic optimizations on links (e.g. DVS)
Easy to use interfaces, modularity, Robust and simple implementation, Reduced design time
Some arguments for reduced power
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip15 / 38
Asynchronous Circuit Basics
Control in asynchronous circuits often relies on simple handshaking protocols (req / ack event cycles)
Delay-insensitive event-driven system - every signal transition is acknowledged
The C-element is a fundamental building block of many asynchronous circuits
– Can be thought of as a AND-gate for events Arbitration
– Mutex– Tree arbiter element– Multiway arbiter– Static Priority Arbiters
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip16 / 38
Asynchronous on-chip networks
How do we build more complex on-chip routers?– Support for virtual-channels– QoS
Challenges– Multi-way & prioritised arbitration– Control overheads
Arbitration and Delay Insensitive circuits can be slow! How can control overheads be hidden?
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip17 / 38
Dynamic Voltage and Frequency Scaling Architecture for Units
Integration within a GALS NoC
E. Beigné , F. Clermidy, S. Miermont, and P. Vivet
NOCS 2008
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip18 / 38
Outline of the paper
Dynamic and Static power consumption issues
NoC architecture for DVFS support
NoC Unit architecture
NoC Unit design
DVFS execution at system level
Power gain & physical implementation
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip19 / 38
Dynamic and Static power consumption issues
Dynamic power consumption reduction– Reduce: switching activity, capacitances, supply voltage, frequency
Static power consumption reduction– Reduce: supply voltage, dominant leakage currents: ISTH, IGIDL, IGATE
– Multi VTH design Low power techniques must exists at all design levels from architecture to physical
implementation
Proposed a fully integrated solution to control locally dynamic and
Static power at Unit level within a GALS NoC
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip20 / 38
NoC architecture for DVFS (1) – NoC architecture
A fully asynchronous Network-on-Chip IP units are synchronous islands using programmable Local Clock Generator Within the IP unit
– Synchronization is done thanks to Pausable Clock– A Power Unit manages internal Vcore generated using external Vhigh and
Vlow
– A Network Interface is in charge of NoC communications Local Power Management
Main CPU in charge of global power management
Local fine grain power management can be executed during IP computation and communication independently from the others
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip21 / 38
NoC architecture for DVFS (2) – Main Principles
Each synchronous IP is an independent power and frequency domain A local fine grain Dynamic Voltage Scaling:
– Implementation of a local hardware controller to control transitions between Vhigh and Vlow
– Ensures smooth DVS transitions for IP safe computation A local fine grain Dynamic Frequency Scaling:
– Automatic frequency scaling– Use of clock generation re-programming to find the optimal V/F point of
operation Thanks to pausable clock technique, IP unit continues its operation during
DVFS phases
GALS architecture and local clock generation is a natural enabler for easy local DVFS
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip22 / 38
NoC Unit architecture
Each IP core encapsulated with– Network Interface– Test Wrapper– Pausable Clock– Power Supply Unit
IP units have 5 supply modes – Init: reset at Vhigh (1.2V)– High: Vhigh supply– Low: Vlow supply (0.8V)– Hopping: switch Vhigh / Vlow for
DVFS– Idle: retention state at Vlow (no
clock) – Off: stand-by mode
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip23 / 38
NoC Unit design (1) - Local Power Manager
Local Power Manager handles unit power modes A set of programmable registers, through the NoC Configuration of
– Programmable delay line– Power Supply Unit
Pulse Width modulator used to control the Hopping mode
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip24 / 38
NoC Unit design (2) – Power Supply Unit
Power Supply Unit manages Vcore Two power switches Thigh and Tlow LVT transistors A Hopping Unit An Ultra Cut-Off Generator
Local Power Supply Unit offers a safe control of internal power supply depending on pre-defined power modes
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip25 / 38
NoC Unit design (3) – Hopping Unit
Energy per operation scales with V²– Decrease Voltage (and Frequency) to be energy
efficient
«Triple state» power supply– Use of two PMOS power switches
Vhigh (1.2 V), Vlow (0.7 V), or OFF (0 V)
Switch between Vhigh and Vlow– Transitions take less than 100 ns– Mean speed / mean power of the IP is programmed by a
PWM
Compatible with synchronous and asynchronous IPs– For GALS system: coordination done with local clock
generator
Can easily be integrated in any CMOS circuit– No inductor contrary to traditional DC/DC converters– No capacitor contrary to charge pump implementation
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip26 / 38
NoC Unit design (4) – Ultra Cut-Off Generator
When reverse polarizing the gate, the leakage current goes through a minimum
The optimal polarization point varies with the temperature, the supply voltage and the process corners
The proposed UCO generator automatically polarizes the gate of the Power switch to its point of minimum leakage
Compensates for temperature variation, alleviates corners variations.
The gate oxide reliability is considered by introducing a passive stress reduction mechanism
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip27 / 38
NoC Unit design (5) – Pausable Clock Interface
Pause temporary the clock when a transfer (NoC) or a supply switch is required
Based on– Two GALS ports : Synchronous-to Asynchronous and
Asynchronous-to-Synchronous– A programmable delay line– A pausable clock generator
Pausable Clock Generator arbitrates pause requests
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip28 / 38
NoC Unit design (5) – Pausable Clock Interface
Programmable delay line – Precise, small and low power– Using Standard cells– On the same unit power domain
Pausable Clock Interface allows an efficient synchronization and a safe dynamic voltage and frequency scaling with minimal
latency cost
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip29 / 38
Power gain
Programmable delay line matches with unit logic on the same power domain
– Compensates any mismatch thanks to re-programmation Power reduction
– Vhigh=1.2V and Vlow=0.8V– 35 % dynamic power reduction between High and Low modes– Hopping mode is used to save power without any latency cost– Leakage power thanks to UCO is reduced by 2 decade
Power Supply Unit efficiency– Hopping Unit
Only resistive losses in the power transistors About 1 mW dynamic power
– => more than 95 % power efficiency– 90 % total efficiency (external DC-DC taken into account)
An adaptive and reliable Power Supply Unit giving high power reduction factor and high power efficiency
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip30 / 38
Physical Implementation
Power Switch– One single Power-Switch for the complete power domain– Sized to get a speed loss<5%– Area : about <5% of the power domain
Hopping Unit– Area : 140μm*35μm– Hopping Transition : <100 ns
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip31 / 38
What we see in recent work?
A fully integrated DVFS architecture within a GALS NoC
We are able to handle leakage problems due to technology scaling– insertion of power switches
Dynamic power reduction is possible through voltage scaling:– Hopping– Management of multi power domains in a complex SoC
The knowledge of GALS systems lead to automatic frequency scaling
– Pausable clock interfaces– Asynchronous implementation
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip33 / 38
Comparing Approaches
Published works on asynchronous routers and networks are not enough, lack of detailed explanation, simulation are difficult and almost impossible
Single latency / throughput figures don’t tell whole story
Detailed comparative studies with real traffic are required
Often difficult to isolate impact of choice of system timing style, many things tend to be different:
– Technology, circuit style, architecture
Problems in comparing synchronous and asynchronous designs
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip34 / 38
Questions about Asynchronous design?
Testing asynchronous circuits– An asynchronous circuit replaces the clock with a large number of
distributed state holding elements– Large area overhead associated with test– Testing of non-deterministic elements (MUTEX)
Performance– ““Asynchronous circuits avoid issues of timing closure, they are correct-
by-construction” – But performance guarantees are still required. Slow synchronous circuits are easy to build!
– Value safe versus time safe– Predicting performance is complex
Perhaps on-chip communication is an application where such characteristics can be tolerated?
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip35 / 38
Synchronous or Asynchronous?
A clock less on-chip network appears to be an elegant solution although some questions remain:
– Test– Performance concerns
Shouldn’t asynchronous designs offer latency advantages?– Fast local control, path/data dependent delays, DI interconnects
Perhaps asynchronous routers mimic synchronous architectures too closely?
– Exploit flexibility, novel architectures, different topologies Overheads for data-driven clocking or GALS currently look small in
comparison Synchronous design has advantages too
– Predictability and determinism can be exploited Fast single cycle routers possible
– Global snapshot of state is good for scheduling Still lots of interesting research to be done
– Need more data points
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip36 / 38
Conclusions
High cost associated with both global synchrony and delay-insensitive circuits
– Can relax constraints in both directions
Which techniques achieve the best cost/benefit mix for on-chip networks?
– Data-driven clocks look promising
?
SYNCHRONOUS
ASYNCHRONOUS
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip37 / 38
Bibliography
R. Mullins, “Asynchronous versus synchronous design techniques for NoCs,” Invited lecture given at SoC 2005, part of a tutorial entitled “The status of the NoC revolution: design methods, architectures and silicon implementation,” Tampere, Finland, November 2005.
E. Beigné , F. Clermidy, S. Miermont, and P. Vivet,
“Dynamic Voltage and Frequency Scaling Architecture for Units Integration within a GALS NoC,” NOCS 2008, Apr. 2008.
E. Beigne and P. Vivet, “Design of On-chip and Off-chip Interfaces for a GALS NoC Architecture”, Proceedings of IEEE International Symposium on Advanced Research in Asynchronous Circuits and Systems, ASYNC'2006, Grenoble, France, pp. 172-181, March 2006.
top related