Top Banner
L17-1 MIT 6.823 Spring 2020 On-Chip Networks II: Router Microarchitecture & Routing Mengjia Yan Computer Science & Artificial Intelligence Lab M.I.T. Based on slides from Daniel Sanchez April 16, 2020
48

On-Chip Networks II: Router Microarchitecture & Routing

Feb 01, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: On-Chip Networks II: Router Microarchitecture & Routing

L17-1MIT 6.823 Spring 2020

On-Chip Networks II: Router Microarchitecture & Routing

Mengjia YanComputer Science & Artificial Intelligence Lab

M.I.T.

Based on slides from Daniel Sanchez

April 16, 2020

Page 2: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Reminder: Wormhole Flow Control

• Each router manages buffers in flits• Each packet is sent through output link as soon as

possible (without waiting for all its flits to arrive)• Router buffers are not large enough to hold full packet à

on congestion, packet’s flits often buffered across routers• Problem: On congestion, links assigned to a blocked

packet cannot be used by other packets

April 16, 2020 L17-2

Page 3: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Reminder: Wormhole Flow Control

• Each router manages buffers in flits• Each packet is sent through output link as soon as

possible (without waiting for all its flits to arrive)• Router buffers are not large enough to hold full packet à

on congestion, packet’s flits often buffered across routers• Problem: On congestion, links assigned to a blocked

packet cannot be used by other packets

April 16, 2020

blocked

B1A B0

Wormhole

idle

L17-3

Page 4: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Virtual-Channel (VC) Flow Control

• When a packet blocks, instead of holding on to channel, hold on to virtual channel

• Virtual channel = channel state + flit buffers• Multiple virtual channels reduce blocking• Ex: Wormhole (=1 VC/channel) vs 2 VCs/channel

April 16, 2020

B1A2

A1B0 A0

VC flow controlwith 2 VCs/channel

blocked

VCs

L17-4

Page 5: On-Chip Networks II: Router Microarchitecture & Routing

L17-5MIT 6.823 Spring 2020

RouterMicroarchitecture

April 16, 2020

Page 6: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Ring-based Interconnect

April 16, 2020

P

CP

C

P C

PCP

C

P

CP

C

P

C

L17-6

Page 7: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Ring Stop

April 16, 2020 L17-7

Latch

Output Buffer

Input Buffer

Control Logic

Components in a router• Buffer• Switch• logic

Input Port

Output Port

Page 8: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Ring Stop

April 16, 2020

Allow input if no traffic on ring

L17-8

Latch

Output Buffer

Input Buffer

Control Logic

Components in a router• Buffer• Switch• logic

Input Port

Output Port

Page 9: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Ring Stop

April 16, 2020

Allow input if no traffic on ring Q: If there is traffic on ring, should traffic on ring or new input get priority?

L17-9

Latch

Output Buffer

Input Buffer

Control Logic

Components in a router• Buffer• Switch• logic

Input Port

Output Port

Page 10: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Ring Flow Control: Priorities

April 16, 2020

Rotary Rule – traffic in ring has priorityL17-10

Page 11: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Ring Flow Control: Bounces

• What if traffic on the ring cannot get delivered, e.g., if output FIFO is full?

April 16, 2020 L17-11

Page 12: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Ring Flow Control: Bounces

• What if traffic on the ring cannot get delivered, e.g., if output FIFO is full?

• One alternative: Continue on ring (bounce)

April 16, 2020 L17-12

Page 13: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Ring Flow Control: Bounces

• What if traffic on the ring cannot get delivered, e.g., if output FIFO is full?

• One alternative: Continue on ring (bounce)

• What are the consequences of such bounces?

April 16, 2020 L17-13

Page 14: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Ring Flow Control: Bounces

• What if traffic on the ring cannot get delivered, e.g., if output FIFO is full?

• One alternative: Continue on ring (bounce)

• What are the consequences of such bounces?

April 16, 2020

Traffic on ring no longer FIFO

L17-14

Page 15: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

General InterconnectTilera, Knights Landing…

April 16, 2020

P

C

P

C

P

C

P

C

P

C

P

C

P

C

P

C

P

C

P

C

P

C

P

C

L17-15

Page 16: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

What’s In A Router?

• It’s a system as well

– Logic – State machines, Arbiters, Allocators• Control data movement through router• Idle, Routing, Waiting for resources, Active

– Memory – Buffers• Store flits before forwarding them• SRAMs, registers, processor memory

– Communication – Switches• Transfer flits from input to output ports• Crossbars, multiple crossbars, fully-connected, bus

April 16, 2020 L17-16

X

Page 17: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Virtual-channel Router

April 16, 2020

Flit

L17-17

Page 18: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Router Pipeline vs. Processor Pipeline

• Logical stages:– BW– RC– VA– SA– BR– ST– LT

• Different flits go through different stages

• Different routers have different variants– E.g. speculation,

lookaheads, bypassing• Different implementations

of each pipeline stage

• Logical stages:– IF– ID– EX– MEM– WB

• Different instructions go through different stages

• Different processors have different variants– E.g. speculation, ISA

• Different implementations of each pipeline stage

April 16, 2020 L17-18

Page 19: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Baseline Router Pipeline

• Route computation performed once per packet• Virtual channel allocated once per packet• Body and tail flits inherit this info from head flit

April 16, 2020

BW RC VA SA ST LT

BW

BW

BW

SA ST LT

SA ST LT

SA ST LT

Head

Body 1

Body 2

Tail

L17-19

1 2 3 4 5 6 7

Page 20: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Allocators In Routers

• VC Allocator– Input VCs requesting for a range of output VCs– Example: A packet of VC0 arrives at East input port. It’s destined

for west output port, and would like to get any of the VCs of that output port.

• Switch Allocator– Input VCs of an input port request for different output ports

(e.g., One’s going North, another’s going West)

• “Greedy” algorithms used for efficiency

April 16, 2020 L17-20

Page 21: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Allocators In Routers

• VC Allocator– Input VCs requesting for a range of output VCs– Example: A packet of VC0 arrives at East input port. It’s destined

for west output port, and would like to get any of the VCs of that output port.

• Switch Allocator– Input VCs of an input port request for different output ports

(e.g., One’s going North, another’s going West)

• “Greedy” algorithms used for efficiency

• What happens if allocation fails on a given cycle?

April 16, 2020 L17-21

Page 22: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

VC & Switch Allocation Stalls

April 16, 2020 L17-22

Page 23: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

VC & Switch Allocation Stalls

April 16, 2020 L17-23

Page 24: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Pipeline Optimizations: LookaheadRouting [Galles, SGI Spider Chip]

• At current router, perform route computation for next router

– Head flit already carries output port for next router – RC just has to read output à fast, can be overlapped with BW– Precomputing route allows flits to compete for VCs immediately

after BW– Routing computation for the next hop (NRC) can be computed in

parallel with VA

• Or simplify RC (e.g., X-Y routing is very fast)

April 16, 2020

BWRC

VANRC SA ST LT

L17-24

Page 25: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Pipeline Optimizations: Speculative Switch Allocation [Peh&Dally, 2001]

• Assume that Virtual Channel Allocation stage will be successful– Valid under low to moderate loads

• If both successful, VA and SA are done in parallel

• If VA unsuccessful (no virtual channel returned)– Must repeat VA/SA in next cycle

• Prioritize non-speculative requests

April 16, 2020

BWRC

VASA ST LT

L17-25

Page 26: On-Chip Networks II: Router Microarchitecture & Routing

L17-26MIT 6.823 Spring 2020

Routing

April 16, 2020

Page 27: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Properties of Routing Algorithms

• Deterministic/Oblivious– route determined by (source, dest), – not affected by network state (i.e. traffic)

• Adaptive– route influenced by traffic along the way

• Minimal– only selects shortest paths

• Deadlock-free– no traffic pattern can lead to a situation where no packets move

forward

April 16, 2020 L17-27

Page 28: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Properties of Routing Algorithms

• Deterministic/Oblivious– route determined by (source, dest), – not affected by network state (i.e. traffic)

• Adaptive– route influenced by traffic along the way

• Minimal– only selects shortest paths

• Deadlock-free– no traffic pattern can lead to a situation where no packets move

forward

April 16, 2020 L17-28

Page 29: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Network Deadlock

• Flow A holds u and v but cannot make progress until it acquires channel w

• Flow B holds channels w and x but cannot make progress until it acquires channel u

April 16, 2020

0 1

23

u

vw

x AB

L17-29

Page 30: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Network Deadlock

• Flow A holds u and v but cannot make progress until it acquires channel w

• Flow B holds channels w and x but cannot make progress until it acquires channel u

April 16, 2020

0 1

23

u

vw

x AB

L17-30

Circular dependences of resources

Page 31: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Dimension-Order Routing

April 16, 2020

XY-order YX-order

SA

Dc

DA DB

SB Sc SBSA

DB

Dc

DA

Sc

Uses 2 out of 4 turns Uses 2 out of 4 turns

XY is deadlock free, YX is deadlock free, what about XY+YX?

L17-31

Page 32: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Dimension-Order Routing

April 16, 2020

XY-order YX-order

SA

Dc

DA DB

SB Sc SBSA

DB

Dc

DA

Sc

Uses 2 out of 4 turns Uses 2 out of 4 turns

XY is deadlock free, YX is deadlock free, what about XY+YX?No!

L17-32

Page 33: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

DOR – Turns allowed

April 16, 2020

• One way of looking at whether a routing algorithm is deadlock free is to look at the turns allowed.

• Deadlocks may occur if turns can form a cycle

XY Model YX Model

L17-33

Page 34: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Allowing more turns

April 16, 2020

• Allowing more turns may allow adaptive routing, but also deadlock

Six turn model

L17-34

Page 35: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Allowing more turns

April 16, 2020

• Allowing more turns may allow adaptive routing, but also deadlock

Six turn model

L17-35

Page 36: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Turn Model [Glass and Ni, 1994]

April 16, 2020

• A systematic way of generating deadlock-free routes with small number of prohibited turns

• Deadlock-free if routes conform to at least ONE of the turn models (acyclic channel dependence graph)

West-First Turn Model North-Last Turn Model

L17-36

Page 37: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

F

A B

E D

C

Vertices in the CDG represent network linksEdges in CDG represent allowed route

Can create a channel dependency graph (CDG) of the network.

Disallowing 180o turns, e.g., AB à BA

2-D Mesh and CDG

April 16, 2020 L17-37

Page 38: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Cycles in CDG

April 16, 2020

The channel dependency graph D derived from the network topology may contain many cycles

F

A B

E D

C

Flow routed through links AB, BE, EFFlow routed through links EF, FA, ABè Deadlock!

L17-38

Page 39: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Key Insight

April 16, 2020

If routes of flows conform to acyclic CDG, then therwill be no possibility of deadlock!

F

A B

E D

C

Disallow/Delete certain edgesin CDG

Edges in CDG correspond toturns in network!

L17-39

Page 40: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

F

A B

E D

C

Turns could be prohibited ad-hoc, all the edges in red are deleted

Ad-hoc AcyclicCDG

Acyclic CDG à Deadlock-free routes

April 16, 2020 L17-40

Page 41: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

East-first à Deadlock-free routes

April 16, 2020

Per the East-Firstprohibited turns, all the edges in red are deleted

East-FirstAcyclicCDG

F

A B

E D

C

L17-41

East-First Turn Model

Page 42: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Resource Conflicts à Deadlock

April 16, 2020

0 1

23

u

vw

x AB

Routing deadlocks in wormhole routing result fromStructural hazard at router resources, e.g., buffers.

How can structural hazards be avoided?

L17-42

Page 43: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Resource Conflicts à Deadlock

April 16, 2020

0 1

23

u

vw

x AB

Routing deadlocks in wormhole routing result fromStructural hazard at router resources, e.g., buffers.

How can structural hazards be avoided? Adding more resources

L17-43

Page 44: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Virtual Channels

• Virtual channels can be used to avoid deadlock by restricting VC allocation

April 16, 2020

F

A B

E D

C

L17-44

Page 45: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

CDG and Virtual Channels

April 16, 2020

F

A B

EAF0 AF1

FE0 FE1

EB0 EB1

BA0 BA1

FA0 FA1

EF0 EF1

BE0 BE1

AB0 AB1

L17-45

Page 46: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

Randomized Routing: Valiant

• Route each packet through a randomly chosen intermediate node

April 16, 2020

SA

Dc

DA DB

SB Sc

A packet, going from node SAto node DA, is first routed from SA to a randomly chosen intermediate node IA, before going from IA to final destination DA.

It helps load-balance the network and has a good worst-case performance at the expense of locality.

IA

L17-46

Page 47: On-Chip Networks II: Router Microarchitecture & Routing

MIT 6.823 Spring 2020

ROMM: Randomized, Oblivious Multi-phase Minimal Routing

April 16, 2020

SA

Dc

DA DB

SB Sc

To retain locality, choose intermediate node in the minimal quadrant

Equivalent to randomly selecting among the various minimal paths from source to destination

L17-47

Page 48: On-Chip Networks II: Router Microarchitecture & Routing

L17-48MIT 6.823 Spring 2020

Thank you!

Next Lecture: Memory Consistency Models

April 16, 2020