Top Banner
Module Module Module Module Module Module Module Module Module Module Module Module Module Module R R R R R R R R R R R R R R Module R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and Dr. Avinoam Kolodny
74

Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

Module

Module

Module

Module

Module

Module

Module

Module

Module Module Module

Module

Module

Module

R

R

R R R

R

RR R R R

R R

R

Module

R

R

R

Quality of Service in Network on Chip

Isask’har (Zigi) Walter

Supervised by:Prof. Israel Cidon, Prof. Ran Ginosar

and Dr. Avinoam Kolodny

Page 2: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 2

Outline

Network on Chip (NoC) and QNoC Capacity Allocation (Joint work with Zvika

Guz) Hot Modules in Wormhole NoCs Summary

Module

HS

Module

Module

Module

Module

Module

Module

Module Module Module

Module

Module

Module

R

R

R R R

R

RR R R R

R R

R

R

R

R

Module

Page 3: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 3

System on Chip (SoC) Interconnect

Explosion in the number of modules in a single chip

Networks are replacing system busses

Low areaLow powerBetter scalability

Higher parallelismSpatial reuseUnicast

Page 4: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 4

Grid topology Packet-switched XY Routing Service-levels Wormhole hop-to-hop

flow-control

QNoC Architecture

Module

Module

Module

Module

Module

Module

Module

Module

Module

Module

ModuleModule Module Module Module

ModuleModule Module Module Module

ModuleModule Module Module Module

R

R

R

R

R R

R

R

R

RR R R R

RR R R R

RR R R R

R

Router Link

E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, “QoS Architecture and Design Process for Cost-Effective Network on Chip”, Journal of Systems Architecture, 2004

Page 5: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 5

Data D7

Wormhole Flow-Control

D0

D1D2

D3

D4

D5D6

Dest

.TYPESL

TYPESL

TYPESL

TYPESL

TYPESL

TYPESL

TYPESL

TYPESL

TYPESL

Flit-based communication

Flit

SL: Service level (0/1/2/3)Type: Head/Body/Tail flit

Destination appears (only) in the header flit Each flit must include a Type field

Page 6: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 6

IP1

Inte

rfac

e

IP2

Wormhole Routing

Interface

Suits well on chip interconnect

Small number of buffers

Low latency Virtual Channels

forconcurrent flits transmission on the same link- Flits of different

packets are locally labeled

Page 7: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 7

Quality of Service in QNoC

Defined by throughput and latency requirements- e.g. Interrupts, real time, block transfers

- Implemented using separated buffers (service levels) and static priority policy

Requirements should be met at low cost- Design parameters- Run-time mechanisms

High Bandwidth

Low Latency

Page 8: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 8

Module

Module

Module

Module

Module

Module

Module

Module

Module Module Module

Module

Module

ModuleModule

Module

Module

QNoC Design Flow

Define inter-module traffic

Place modules

Allocate link capacities

Verify QoS and cost

R

R

R R R

R

RR R R R

R RR

R

R

R

R

R

RR

R

R

R

R

R

R

R R

R R

R

R

R

R

R

RR

R

R

Page 9: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 9

Module

Module

Module

Module

Module

Module

Module

Module

Module Module Module

Module

Module

Module

R

R

R R R

R

RR R R R

R R

R

R

R

R

Module

QNoC Design Flow

Module

Module

Module

Module

Module

Module

Module

Module

Module Module Module

Module

Module

Module

R

R

R R R

R

RR R R R

R R

R

R

R

R

Module

Define inter-module traffic

Place modules

Allocate link capacities

Verify QoS and cost

Too low capacity results in poor QoS Too high capacity wastes power/area

Module

Module

Module

Module

Module

Module

Module

Module

Module Module Module

Module

Module

Module

R

R

R R R

R

RR R R R

R R

R

R

R

R

Module

Module

Module

Module

Module

Module

Module

Module

Module

Module Module Module

Module

Module

Module

R

R

R R R

R

RR R R R

R R

R

R

R

R

Module

Page 10: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 10

Use Existing Algorithms?…

Efficient algorithms exist for store-and-forward networks

These algorithms are useless for wormhole networks, as they ignore inter-link dependencies

Page 11: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 11

Our Approach

Analytical model to forecast QoS Capacity Allocation algorithm that exploit

the model

Z. Guz, I. Walter, E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny “Efficient Link Capacity and QoS Design for Wormhole Network-on Chip”, accepted to Design, Automation and Test (DATE), 2006

Z. Guz, I. Walter, E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny , “Network Delays and Link Capacities in Application-Specific Wormhole NoCs”, VLSI Design, 2007

Page 12: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 12

Delay Analysis - Goal

s1

d2

s2

d1

R

R

R R R

R

RR R R R

R R

R

R

R

R

Replace extensive simulations, with an analytical model to forecast QoS

Approximate per-flow latencies Given:

- Network topology- Communication demands- Link capacities

Page 13: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 13

Though many wormhole analyses exists, they don’t fit because they assume:- symmetrical communication demands - no virtual channels- identical link capacity!

Generally, they calculate the delay of an “average flow”- A per-flow analysis is needed

Delay Analysis – Prior work 1/4

Page 14: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 14

Delay Analysis – Prior work 2/4

H. Sarbazi-Azad, A. Khonsari and M. Ould-Khaoua, “Performance Analysis of Deterministic Routing in Wormhole K-ary n-cubes with Virtual-Channels”, Journal of Interconnection Networks, 2002

Page 15: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 15

Delay Analysis – Prior work 3/4

Approximate the delay of an “average flow”

H. Sarbazi-Azad, A. Khonsari and M. Ould-Khaoua, “Performance Analysis of Deterministic Routing in Wormhole K-ary n-cubes with Virtual-Channels”, Journal of Interconnection Networks, 2002

Page 16: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 16

Delay Analysis – Prior work 4/4

S. Loucif and M. Ould-Khaoua, “Modeling Latency in Deterministic Wormhole-Routed Hypercubes under Hot-Spot Traffic”, The Journal of Supercomputing, 2004

Page 17: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 17

Wormhole Delay Analysis

Network

TopologyCommunication Demands

Links’ Capacity

Per-flow Latencies

Page 18: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 18

Delay Analysis - Basics

Focus on long packets Packet transmission can be divided

into two separated phases:- Path acquisition- Flits’ transmission

For simplicity, we assume “enough” VCs on every link- Path acquisition time is negligible

Page 19: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 19

IP1

Inte

rfac

e

IP2Interface

Main Observation

The delivery resembles a pipeline pass

Page 20: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 20

IP1

Inte

rfac

e

IP2Interface

The delivery time of long packets is dominated by the slowest link- Transmission

rate- Link sharing

Packet Delivery Time

Low-capacity link

Page 21: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 21

IP1

Inte

rfac

e

Interface Interface

IP2

Packet Delivery Time

The delivery time of long packets is dominated by the slowest link- Transmission

rate- Link sharing

IP3

Page 22: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 22

Analysis Basics

Determines the flow’s effective bandwidth Per link

Account for interleaving

t

t

Page 23: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 23

Single Hop Flow, no Sharing

- mean time to deliver a flit of flow i over link j [sec]

- capacity of link j [bits per sec] - flit length [bits/flit]

1

1ij

jl

tC

ijt

jC

l

Page 24: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 24

The Effect of Sharing

H. Sarbazi-Azad, A. Khonsari and M. Ould-Khaoua, “Performance Analysis of Deterministic Routing in Wormhole K-ary n-cubes with Virtual-Channels”, Journal of Interconnection Networks, 2002

Use heuristics to model “flit interleaving delay” of each link on its path

Page 25: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 25

- mean time to deliver a flit of flow i over link j

- capacity of link j [bits per second] - flit length [bits/flit] - total flit injection rate of all flows sharing link j

except for flow i [flits/sec]

1

1ij i

j jl

tC

ijtjC

ij

Single Hop Flow, with Sharing

l

Bandwidth used by

other flows on link j

Page 26: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 26

The Convoy Effect

Consider inter-link dependencies - Wormhole backpressure - Traffic jams down the road

| ( , )ij

i ii i k kj j i

k k k

l tt t

C dist j k

Link Load

Account for all subsequent hops Basic delay

weighted by distance

Page 27: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 27

Total Packet Transmission Time

Slowest link dominates transmission time

max( | )i i i ijT m t j

Packet size[flits/packet]

Account for weakest link

Page 28: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 28

Source Queuing

And finally:

Page 29: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 29

Analysis Validation

Analytical model was validated using simulations- Different link capacities- Different communication

demands

Analysis and Simulation vs. Load

Utilization

No

rmal

ized

Lo

ad

Page 30: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 30

Per-Flow Validation Example

Page 31: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 31

Capacity Allocation Problem Use the delay analysis to solve an

optimization problem

Given:- System topology and routing- Each flow’s bandwidth (fi ) and delay

bound (TiREQ)

Minimize total link capacity Such that:

: i iREQflow i T T

ee E

C

Page 32: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 32

Capacity Allocation Algorithm

Greedy, iterative algorithm

For each src-dst pair: Use delay model to identify most sensitive link

Increase its capacity Repeat until delay requirements are met

Page 33: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 33

Capacity Allocation – Example#1 A simple 4-by-4 system with uniform traffic pattern

and uniform requirements “Classic” design: 74.4Gbit/sec Using the delay model and algorithm: 69Gbit/sec Total capacity reduced by 7%

Before optimization

After optimization00

0102

03

1011

1213

2021

2223

3031

3233

Page 34: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 34

A More Realistic Case

Page 35: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 35

DVD Decoder - Results A SoC-like system with specific traffic demands and

delay requirements “Classic” design: 41.8Gbit/sec Using the algorithm: 28.7Gbit/sec Total capacity reduced by 30%

After optimization

Before optimization00

0102

03

1011

1213

2021

2223

Page 36: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 36

Cost Reduction by Slack Elimination

Page 37: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 37

Results - Flow Latencies

Page 38: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 38

Example#3 - VOPD Application

Video Object Plane Decoder “Classic” design: 640Gbit/sec Using the algorithm: 369Gbit/sec Total capacity reduced by 40%

Page 39: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 39

Summary

Capacity Allocation- Simple analytical model, capturing

multiple VCs, different link capacities, different communication demands

- Allocation algorithm that reduces network cost

Page 40: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 40

Future Work

Extensions- Finite number of VCs- Analytical delay modeling- Allocation algorithm

New Applications- Core Placement- Topology selection- Routing

Page 41: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 41

Outline

NoC and QNoC Capacity Allocation (Joint work with Zvika

Guz) Hot Modules in QNoC Summary

Module

HS

Module

Module

Module

Module

Module

Module

Module Module Module

Module

Module

Module

R

R

R R R

R

RR R R R

R R

R

R

R

R

Module

Page 42: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 42

Hot-Modules

NoC is designed and dimensioned to meet QoS requirements- Buffer sizing, routing, router arbitration, link capacities, …

NoC designers cannot tune everything- Modules typically have limited capacity

High-demanded, bandwidth limited modules create edge bottlenecks- In SoC, often known in advance

Off-chip DRAM, on-chip special purpose processor

System performance is strongly affected- Even if the NoC has infinite bandwidth

Page 43: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 43

Hot Module (HM) in NoC Wormhole, BE NoC

At high Hot Module utilization, multiple worms “get stuck” in the network

Two problems arise:- System Performance- Source Fairness

IP(HM) In

terf

ace

Page 44: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 44

IP3Interface

IP2

Inte

rfa

ce

IP1(HM) In

terf

ace

HM is not a local problem. Traffic not destined at the HM suffers too!

Hot Module Affects the SystemProblem

#1

Page 45: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 45

Multiple locally fair decisions

Global fairness

HM

Inte

rfac

e

The limited, expensive HM resource isn’t fairly shared

Source Fairness ProblemProblem

#2

Page 46: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 46

216

BW

IP

IP

IP

R

R

R

18

BW

54

BW

54

BW

54

BW

108

BW

108

BW

216

BW

72

BW

24

BW

Saturation (Un)Fairness

BW

2

BW

A saturated router divides available BW equally between inputs

4

BW

HM IP IP

IP IP IP

IP IP IP

8

BW

4

BW

8

BW

RR R

RR R

RR R

6

BW

6

BW

6

BW

12

BW

12

BW

18

BW

18

BW

2

BW

36

BW

36

BW

24

BW

72

BWLess than 1% of

HM BW!

Page 47: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 47

Blocked Output Ports…

Page 48: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 48

Related Work Hotspots solution were comprehensively studied in

the last two decades (e.g. Pfister and Norton 1985, Duato et al., 2005)

Classically, solutions are categorized by the mechanism policy- Avoidance-based (frequently impossible)- Detection-based (requires threshold tuning)- Prevention-based (overhead during light load)

And by the mechanism implementation- Central arbitration- Router-based- End-to-end flow-control Seem to draw

most attention

Page 49: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 49

Router-Based Solutions

X-Bar

Input Buffer

Output Buffer

Solving HS by routers- Virtual circuit- Fair queuing- Dedicated queues- Deflective routing- Packet combining- Packet dropping- Backpressure

(credit/rate based)- and more…

Routers can(?) detect congested periods- Easier in store-and-forward networks

Page 50: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 50

Router-Based Solutions

QNoC routers are simple

Fast, power and area efficient- A few buffers- Efficient routing- Simple arbitration

policy- No state/flow

memory

X-Bar

Input Buffer

Output Buffer

Page 51: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 51

Related Work

Examples:- “Self-Tuned Congestion Control for Multiprocessor

Networks”, M. Thottethodi, A. R. Lebeck and S. Mukherjee, HPCA 2000

- “A New Scalable and Cost-Effective Congestion Management Strategy for Lossless Multistage Interconnection Networks”, J. Duato, I. Johnson, J. Flich, F. Naven, P. Garcia and T. Nachiondo, HPCA 2005

A few end-to-end solutions do exist- Stop-and-wait based- Do not prevent hotspot effects- Do not address fairness problem

Page 52: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 52

Our Approach

Problem is not caused by the NoC- But rather by a congested end-point

Solution should address the root cause- Not the symptoms

Utilize existing NoC infrastructure

Solve both problems- Simple and efficient

Page 53: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 53

Hot Module Congestion

During congested periods, sources should not inject packets towards the HM- Will experience increased delay anyway- Better wait at the source, not in the network

Keep routers unmodified!

Page 54: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 54

IP1

Control

IP4

NoC

Interface

Interface

IP3

IP2(HM)

HM Allocation Control Basics

Inte

rfac

eA

llocati

on

Con

trolle

r

Interface

Page 55: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 55

IP1

IP4

NoC

Interface

Interface

IP3

IP2(HM)

Inte

rfac

eControl

HM Allocation Control Basics

Allocati

on

Con

trolle

r

Interface

Page 56: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 56

IP1

Control

NoCIP2

(HM)

Allocati

on

Con

trolle

r

Interface

IP3

IP4

Interface

HM Allocation Control Basics

Inte

rfac

e

Interface

Page 57: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 57

HM Control Packets

The HM Controller receives all requests and can employ any scheduling policy

Control packets are sent using a high service level- Bypassing (blocked) data packets!

Dest.

Req. C

redit

Source

Dest.

Credit

Source

Credit request packet Credit reply packet

Page 58: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 58

QNoC Router

CR

OS

S-B

AR

SchedulerControlRouting

CREDIT

B uffe rsSIG NAL

RT

RD /W R

BLO CK

SIG NAL

RT

RD /W R

BLO CK

CREDIT

SchedulerControlRouting

CREDIT

SIG NAL

RT

RD /W R

BLO CK

SIG NAL

RT

RD /W R

BLO CK

CREDIT

O utput portsInput ports

Input

Port

#1

Input

Port

#5

Outp

ut

Port

#1

Outp

ut

Port

#5

R

R R

R

Module

R

Module

R

Module

R

R

R

Module

Page 59: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 59

Enhanced Request packet The request may include additional data as

needed- payload’s priority, deadline, expiration time, etc.

Dest.

Deadline

Expiration

Priority

Req. C

redit

Source

……

Optional fields

Credit request packet

Page 60: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 60

SRC

Size

Priority

deadline

Expiration……

The HM Allocation Controller is customized according to system’s requirements

HM Allocation Controller

PendingRequests

Table

LocalArbiter

CreditRequests

CreditReplies

Requests Decoder

Reply Encoder

Optional

HM Access Controller

Page 61: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 61

Short packets are not negotiated Source’s quota is slowly self-refreshing The mechanism is turned-off when the

network is not congested Crediting modules ahead of time hides

request-grant latency- For light-load periods

Further Enhancements

Page 62: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 62

Not Classic Flow-Control

Flow-control protects destination’s buffer- A pair-wise protocol

HM access regulation protects the system- Many-to-one protocol

Page 63: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 63

Results – Synthetic scenario Hotspot traffic

- All-to-one traffic with all-to-all background traffic

High network capacity Limited hot module bandwidth HM controller arbitration: Round-robin

Module

Module

HM

Module

Module

Module

Module

Module

ModuleModule Module Module

ModuleModule Module Module

R

R

R

R

R R

R

RR R R

RR R R

R

Page 64: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 64

System Performance

Without regulation

WithRegulation

X30

X10

Average Packet Latency

Page 65: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 65

Hot vs. non-Hot Module Traffic

HM Trafficwithout regulation

Background TrafficWithout regulation

HM Trafficwith regulation

Background TrafficWith regulation

Using regulation, non-HM traffic latency is drastically reduced

X40

Average Packet Latency

Page 66: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 66

Source Fairness

Source#16no regulation

Source#5no regulation

Source#5with regulation

Source#16with regulation

2

6

1

5

3

7

4

8

109 11 12

1413 15 16

R

R

R

R

R R

R

RR R R

RR R R

R

Page 67: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 67

Fairness in Saturated Network

Hot-Module Utilization: 99.99% Regulated Hot-Module Utilization: 98.32%

Simulation results for a 4-by-4 system,Data packet length: 200 flitsControl packet length: 2 flits

No allocation controlWith allocation control

Page 68: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 68

MPEG-4 Decoder

Real SoC Over provisioned NoC Two hot-modules

VU AU MED CPU

RAST

SDRAM SRAM1 SRAM2 IDCT

ADSP UP SAMP

BAB RISC

25% of all traffic

22% of all traffic

SDRAM SRAM2

Page 69: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 69

Results – MPEG-4 Decoder

@80% load: X2 reduction @80% load: X8 reduction

All traffic HM/non-HM traffic breakdown

X2

X8

Page 70: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 70

The HMs are better utilized

Without regulation, the hot-modules are only 60% utilized- Traffic to one HM blocks the traffic to the other!

No allocation controlWith allocation control

1HM1 2HM1 3HM1 4HM1 9HM1 10HM1 11HM1 8HM2 10HM2 11HM2 12HM2 Total

Flows destined at HM1

Significant differences in BW!

Flows destined at

HM2

Page 71: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 71

Hot-Module Placement

Page 72: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 72

Future Work

Dynamically set hot-modules Other scheduling policies at hot-

module controller Single/Multiple control modules for

multiple HMs Effect of Placement

Page 73: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 73

Summary Hot-modules are common in real SoCs

Hot-modules ruin system performance and are not fairly shared- Even in NoCs with infinite capacity- The network intensifies the problem- But can also provide tools for resolving it

Simple mechanism achieves dramatic improvement- Completely eliminating the HM effects

Hot-Modules, Cool NoCs!

Page 74: Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

February, 2008 NoC Seminar 74

Thank you!

Questions?

[email protected]

Hot-Modules, Cool NoCs!

M odule

M odule M odule

M odule M odule

M odule M odule

M odule

M odule

M odule

M odule

M odule

QNoCResearch

GroupGroup

ResearchQNoC