The Asynchronous NOC Ran Ginosar NOCS Tutorial San Diego, 10 May 2009 Ran Ginosar Async Noc Tutorial NOCS 2009 2 SoC • Each color is a separate clock domain R R R R R R R R R R R R R R R R R Ran Ginosar Async Noc Tutorial NOCS 2009 3 SoC • What clock for the interconnect? – Fastest? – Opportunistic? – None? R R R R R R R R R R R R R R R R R Ran Ginosar Async Noc Tutorial NOCS 2009 4 Conceptual Summary • NOCs are for large SOCs • Large SOCs = multiple clock domains → NOCs should be asynchronous • Two complementary research areas: – Asynchronous routers • simplify design, low power – Asynchronous interconnect • high bandwidth, low power • Problem: need special CAD, special methodology – Solutions: • deliver and use as “configurable hard IP core” • use only at physical design phase • deliver as predesigned infrastructure (FPGA, SOPC)
18
Embed
Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Asynchronous NOC
Ran Ginosar
NOCS Tutorial
San Diego, 10 May 2009
Ran Ginosar Async Noc Tutorial NOCS 2009 2
SoC
• Each color is a separate clock domain
R
R
R R R
R
RR R R R
R R
R
R
R
R
Ran Ginosar Async Noc Tutorial NOCS 2009 3
SoC
• What clock for the interconnect?– Fastest?– Opportunistic?– None?
R
R
R R R
R
RR R R R
R R
R
R
R
R
Ran Ginosar Async Noc Tutorial NOCS 2009 4
Conceptual Summary
• NOCs are for large SOCs
• Large SOCs = multiple clock domains
→ NOCs should be asynchronous
• Two complementary research areas: – Asynchronous routers
• simplify design, low power
– Asynchronous interconnect• high bandwidth, low power
• Problem: need special CAD, special methodology– Solutions:
• deliver and use as “configurable hard IP core”
• use only at physical design phase
• deliver as predesigned infrastructure (FPGA, SOPC)
Ran Ginosar Async Noc Tutorial NOCS 2009 5
Contextual Summary
• Async routers– QNoC (Technion, Israel)
– Other research (borrowed slides acknowledged and referenced)• Faust (LETI, France)
• Alpin (LETI, France)
• Mango (DTU, Denmark)
• Async Interconnect– FOX (Technion, Israel)
The QNoC Async Router
Ran Ginosar Async Noc Tutorial NOCS 2009 7
NoC router 101
SL1INPUT PORT SL1OUTPUT PORT
SL1INPUT PORT
SL1INPUT PORT SWITCH
SL1OUTPUT PORT
SL1OUTPUT PORT
Ran Ginosar Async Noc Tutorial NOCS 2009 8
Single-Service-Level Router
InputPort
InputPort
OutputPort
OutputPort
Ran Ginosar Async Noc Tutorial NOCS 2009 9
Single-Service-LevelInput-Port
Ran Ginosar Async Noc Tutorial NOCS 2009 10
Single-Service-LevelOutput-Port
Adding multiple service levels
Ran Ginosar Async Noc Tutorial NOCS 2009 12
Multi-Service-Level Input-Port
Reuse of the
Single-Service-
Level Input-Port
Reuse of the
Single-Service-
Level Input-Port
Ran Ginosar Async Noc Tutorial NOCS 2009 13
Multi-Service-Level Output-Port
SSL-OP
(SL0) Ro4-Way
SPA
G_SL3
Ro
Ao
Do
BT
A_SL0
RH_SL0
RBT_SL0
D_SL0 Di
A
H
SSL-OP
(SL2)
Ro
Ao
Do
SSL-OP
(SL3)
Ro
Ao
Do
SSL-OP
(SL1)
Ro
Ao
Do
Ao
Dout
Ao_0
Ao_1
Ao_2
Ao_3
Ro_0
Ro_1
Ro_2
Ro_3
Do_0
Do_1
Do_2
Do_3
Gate
G_SL1G_SL2G_SL3G_SL4
4
4
4
4*Flit
BT
A_SL1
RH_SL1
RBT_SL1
D_SL1 Di
A
H4
4
4
4*Flit
BT
A_SL2
RH_SL2
RBT_SL2
D_SL2 Di
A
H4
4
4
4*Flit
BT
A_SL3
RH_SL3
RBT_SL3
D_SL3 Di
A
H4
4
4
4*Flit
SL
Index
S* R
Reuse of the
Single-Service-Level
Output-Port
Reuse of the
Single-Service-Level
Output-Port
Inter-Service-
Level Arbitration
Inter-Service-
Level Arbitration
Ran Ginosar Async Noc Tutorial NOCS 2009 14
Buffering and Credits
MonitoringActivity
Ran Ginosar Async Noc Tutorial NOCS 2009 15
PerformanceASYNC routerSYNC router
MHz-270.2 **Max Clock Frequency
Mflits/s75.267.6Max Data Rate
ns (CLK)13.314.8 (4)Data Cycle
ns (CLK)13.0/9.2 *3.7 (1)Min Latency (Input to Output)
620880Number of FFs+Latches
34,20070,000 Number of transistors
Gates8,50017,500 Equivalent Gates (2-in NAND)
µm2470,000960,000Cell Area
* Latency for async router specified for header / body flits.** Synchronous router has a critical path of ~20 FO4 gate delays, matching or outperforming other published results.
Ran Ginosar Async Noc Tutorial NOCS 2009 16
The critical path spans multiple routers
IP
OP
IP
OP
Multi-Service-Level Router Critical Path
Single-Service-Level Router Critical Path
Router Router
Adding virtual channels
Ran Ginosar Async Noc Tutorial NOCS 2009 18
Asynchronous Router 2D Structure
VC classification
SL classificationSending Request
to Output Port
Virtual Channel
Admission Control
Intra-Service
Level Arbitration
Inter-Service
Level Arbitration
Ran Ginosar Async Noc Tutorial NOCS 2009 19
Conclusions• Async routers
– less area than sync routers
– no need for global or local clocks
– no need for synchronization
• Highly configurable– Ports
– Service Levels
– Virtual Channels (inside each service level)
• Dynamic virtual channel allocation
• Fast and fair asynchronous arbitration
• Simulated: 200MFlits/s @ 0.18µm – 5 ports, 4 x SL, 2 x VC
– 45 kgates totally per NoC block unit to provide :• communication, Quality-of-Service, configurability,• robustness & multi-clock domains
– OK for units with average complexity of about 300 kgates (~15%)
• NoC communication overhead– NI credit mechanism + packet headers : ~10% total NoC throughput– Virtual channel (low latency packets) : 50 % area of NoC node + GALS IF
NODE
GALS
NI
Unit
North
West
South
East
Unit
Clock
ALPIN
CEA-LETI
Grenoble, France
Ran Ginosar Async Noc Tutorial NOCS 2009 33
ALPIN
• Claim 1: Async NOC (=GALS SOC) easily enables dynamic voltage and frequency scaling (DVFS)– Lower voltage to some modules when slow
– Power off to some modules to save leakage
– Sync modules use “pausable clock”• When voltage and frequency change, local module clock is paused momentarily
• Claim 2: Routers used lightly– Shut off when idle
• Data cycle: One gate delay between bits– 15 ps @ 65nm (30ps @ low-power 65nm)
Ran Ginosar Async Noc Tutorial NOCS 2009 52
Transmitter: Transition Generator
Adapted from M.J.E. Lee, "An Efficient I/O and Clock Recovery for TERABIT Integrated Circuits Design,“ PhD Thesis, Stanford Univ., 2001.
Ran Ginosar Async Noc Tutorial NOCS 2009 53
Transmitter: Fast Async Shift Register
Ran Ginosar Async Noc Tutorial NOCS 2009 54
Transmitter: Encoder + Combiner
Ran Ginosar Async Noc Tutorial NOCS 2009 55
Example: SR-Element layout
Ran Ginosar Async Noc Tutorial NOCS 2009 56
FOX Receiver
Ran Ginosar Async Noc Tutorial NOCS 2009 57
Receiver: Decoder + Splitter
Ran Ginosar Async Noc Tutorial NOCS 2009 58
Toggle Circuit
• New circuit
• Single gate delay operation
Ran Ginosar Async Noc Tutorial NOCS 2009 59
Channel
• Channel driver and receiver
– Differential voltage mode
– Differential current mode
– Single-ended (just repeaters)
• Channel layout
• Interconnect modeling
Ran Ginosar Async Noc Tutorial NOCS 2009 60
Differential Voltage Mode
• Current mode differential low-swing transmit
• Differential voltage receive
• Voltage swing � high power, low speed
P / S P / S
Ran Ginosar Async Noc Tutorial NOCS 2009 61
Differential Current Mode: Goals
• Full Current Mode:– Send current from TX
– Measure current at RX
• Minimal voltage swing over channel– Low power, high speed
• Current to voltage conversion at RX
• Long range channel without repeaters
Ran Ginosar Async Noc Tutorial NOCS 2009 62
Differential Current Mode:Transmitter and Receiver
OutputStage
FS
Ran Ginosar Async Noc Tutorial NOCS 2009 63
Channel wires
• Four wires
• Thick metal
• Single layer: non-disruptive to routing
D D S S
Ran Ginosar Async Noc Tutorial NOCS 2009 64
Channel wire model
2 1 2 2 1
2 2
1 1 2 2 1 2 2 1
2 22 21 21 1 2 2
2 2
1 2
1 ( )
( )
( 1) ( 1)DC DC
L L L L
s L R s L R L L R RZ s sL R
L Ls L R s L R
R R
ω
ω ω
∆ ∆ ∆ ∆+ ⋅ +
∆ ∆ ∆ ∆ ∆ + ∆ ∆ ∆= + + + =
∆ ∆∆ + ∆ ∆ + ∆⋅ + ⋅ ⋅ +∆ ∆
Ran Ginosar Async Noc Tutorial NOCS 2009 65
FOX Status
• Four years study and design
• Simulations show:– 65 Gbps over 7mm
– All corners and ±5σ in-die variations• Thanks to async operation
• Presently going for fab on IBM 65nm (MOSIS)
Ran Ginosar Async Noc Tutorial NOCS 2009 66
FOX Summary
• Fastest possible digital on-chip serial interconnect– Data rate of a single gate delay
• Asynchronous does it
• To be used as “hard IP core”
Ran Ginosar Async Noc Tutorial NOCS 2009 67
Summary
• NOCs are for large SOCs
• Large SOCs = multiple clock domains
→ NOCs should be asynchronous
• We reviewed two complementary areas: – Async routers
– High speed async serial interconnect
Ran Ginosar Async Noc Tutorial NOCS 2009 68
References• QNoC Async Router
– R. Dobkin, V. Vishnyakov, E. Friedman, R. Ginosar, An asynchronous router for multiple service levels networks on chip, ASYNC 2005.
– R. Dobkin, R. Ginosar and I. Cidon, QNoC Asynchronous Router with Dynamic Virtual Channel Allocation, NOCS 2007.
– R. Dobkin, R. Ginosar and A. Kolodny, QNoC Asynchronous Router, Integration—The VLSI Journal, 42(2):103-115, 2009.
• FAUST– E. Beigné, F. Clermidy, P. Vivet, A. Clouard, M. Renaudin, An Asynchronous NOC Architecture Providing Low
Latency Service and its Multi-level Design Framework, ASYNC 2005.
• ALPIN– E. Beigné, F. Clermidy, S. Miermont, P. Vivet, Dynamic Voltage and Frequency Scaling Architecture for Units
Integration within a GALS NoC, ASYNC 2008.– Y. Thonnart, E. Beigné, A. Valentian, P. Vivet, Automatic Power Regulation based on an Asynchronous
Activity Detection and its Application to ANOC Node Leakage Reduction, NOCS 2008.
• MANGO– T. Bjerregaard, J. Sparso, A scheduling discipline for latency and bandwidth guarantees in asynchronous
network-on-chip, ASYNC 2005.– T. BJERREGAARD, J. SPARSØ, A router architecture for connection-oriented service guarantees in the
MANGO clockless network-on-chip, DATE 2005.
• FOX– R. Dobkin, R. Ginosar and A. Kolodny, Fast Asynchronous Shift Register for Bit-Serial Communication,
ASYNC 2006. – R. Dobkin, Y. Perelman, T. Liran, R. Ginosar, and A. Kolodny, High rate wave-pipelined asynchronous on-
chip bit-serial data link, ASYN 2007. – R. Dobkin, A. Morgenshtein, A. Kolodny, R. Ginosar, Parallel vs. Serial On-Chip Communication, SLIP 2008. – R. Dobkin, M. Moyal, A. Kolodny and R. Ginosar, Asynchronous Current Mode Serial Communication, IEEE
Trans. On VLSI, 2009.
• Others– T. Felicijan, S.B. Furber, An asynchronous on-chip network router with Quality-of-Service (QoS) support,
Int. SOC Conf. (2004) 274–277.
Ran Ginosar Async Noc Tutorial NOCS 2009 69
More Literature– S. Moore, G. Taylor, R. Mullins, P. Robinson, Point to point GALS interconnect, ASYNC (2002) 69–75.
– S. Oetiker, F.K. Gu¨ rkaynak, T. Villiger, H. Kaeslin, N. Felber, W. Fichtner, Design flow for a 3-million transistor GALS test chip, ACiD Workshop (2003).
– T. Villiger, H. Kaeslin, F.K. Gurkaynak, S. Oetiker, Wolfgang Fichtner, Self-timed ring for globally-asynchronous locally-synchronous systems, ASYNC (2003) 141–150.
– J. Muttersbach, T. Villiger, W. Fichtner, Practical design of globally asynchronous locally-synchronous systems, ASYNC (2000) 52–61.
– A.E. Sjogren, C.J. Myers, Interfacing synchronous and asynchronous modules within a high-speed pipeline, TVLSI 8 (5) (2000) 573–583.
– R. Dobkin, R. Ginosar, C.P. Sotiriou, High rate data synchronization in GALS SoCs, TVLSI 14 (10) (2006) 1063–1074.
– Y. Semiat, R. Ginosar, Timing measurements of synchronization circuits, ASYNC (2003) 68–77.
– R. Kol, R. Ginosar, Adaptive synchronization, ICCD (1998) 188–189.
– D.J. Kinniment, Synchronization and Arbitration in Digital Systems, Wiley, New York, 2008.
– R. Ginosar, Fourteen ways to fool your synchronizer, ASYNC (2003) 89–96.
– D.J. Kinniment, A. Yakovlev, Low latency synchronization through speculation, PATMOS (2004) 278–288.
– A. Chakraborty, M.R. Greenstreet, Efficient self-timed interfaces for crossing clock domains, ASYNC (2003) 78–88.
– A. Chakraborty, M.R. Greenstreet, A minimal source–synchronous interface, ASIC/SOC (2002) 443–447.
– S. Chakraborty, J. Mekie, D.K. Sharma, Reasoning about synchronization techniques in GALS systems: a unified approach, FMGALS (2003).
– J. Mekie, S. Chakraborty, D.K. Sharma, Evaluation of pausible clocking for interfacing high speed IP cores in GALS framework,VLSI Des. (2004) 559–564.
– R. Mullins, S. Moore, Demystifying data-driven and pausible clocking schemes, ASYNC (2007) 175–185.
– L. Carloni, A. Sangiovanni-Vincentelli, Coping with latency in SoC design, IEEE Micro (special issue on SoC) 22 (5) (2002) 24–35.
– R. Dobkin and R. Ginosar, Two Phase Synchronization with Sub-cycle Latency, Integration—The VLSI Journal, 2008.
– K. Goossens, J. Dielissen, and A. Radulescu, AEthereal network on chip: Concepts, Architectures, and Implementations, IEEE Design and Test of Computers, Vol 22(5):414--421, Sept-Oct 2005.
– T. Bjerregaard, S. Mahadevan, A Survey of Research and Practices of Network-on-Chip, ACM Computing Surveys, Vol. 38, March 2006.