Top Banner
Architecture and Routing for NoC- based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar
27

Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

Architecture and Routing for NoC-based FPGA

Israel Cidon*

*joint work with Roman Gindin and Idit Keidar

Page 2: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

2

Israel Cidon - Technion

FPGA

One NoC does not fit all!

Flexibility

Traffic uncertainty

single application

General purpose computer

Chip design

Run time

SOC

CMP

I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006

Configuration

Page 3: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

3

Israel Cidon - Technion

Field Programmable Gate Array - 101

Flexible Soft logicConfigurable logic blocks (CLBs) and routing

channels Programmed Look-up-tables (LUTs) Configurable switching boxes

Area, power and speed efficient Hard logic Wire and clock infrastructureSpecial purpose modules, e.g., CPU, SerDes

Page 4: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

5

Israel Cidon - Technion

Challenges for Future FPGA Scalability of design methodology Dominance of wire delays

Already more than 50% of delay Power Complex communication patterns Prototyping for NoC-based SoCs

Page 5: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

6

Israel Cidon - Technion

NoC Based FPGA Architecture

CR

CR

R

FRSERDESCNI

R

FR CPU

RCNI

CR

CNIR

CR

R

CNIR

R

CR

CNIR

R

CNIR

FRDSP

CNIR

CR

R

FRPCI

RCNI

CR

CNIR

CR

CNIR

CNIR

FRCPU

RCNI

CR

CNIR

CNIR

CNIR

FRDRAM

R

CR

RCNI

R

CR

CNIR

CNIR

CNI

CNI CNI

FRETHI/F

CNIR

CNIR

FRD/AA/D

CNIR

CNIR

FRETHI/F

CNIR

CNI CNI CNI CNI

Functional unit

Routers

NoC for inter-routing

Configurable region – User

logic

Configurable network interface

Page 6: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

8

Israel Cidon - Technion

Hard or soft NoC?

Why hard Interconnect is a

performance bottleneck

Interconnect power Part of FPGA

infrastructure

Why soft Application is not

known when the network is built

Provides maximum flexibility

Prevents resource lockup

Page 7: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

9

Israel Cidon - Technion

Suggested FPGA NoC ArchitectureNoC Element Implementation

Wires, repeaters, etc. Hard

Routers, including VCs, buffers, QoS support

Hard

Network interfaces Soft: Configurable Network Interface (CNI)

Routing algorithm and headers

Soft: determined in CNI

Routing tables Soft

Page 8: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

10

Israel Cidon - Technion

FPGA Routing – Optimization Problem

Set of ApplicationsDifferent Architectures

Different Traffic Patterns

Implemented on the same

chip

Common efficient NoC

Page 9: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

11

Israel Cidon - Technion

The NoC design problem

The cost Hard grid links

For uniform grids - the capacity of the most congestion link NoC Logic

Hard logic for router Soft logic for routing tables, headers, CNIs

Design Envelope Collection of designs supported by a given programmable chip

The variables Number of “hard-coded” wires per link Possible configurable routing schemes

Page 10: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

12

Israel Cidon - Technion

Routing Schemes XY

Very simple logic Deadlock free Unbalanced - high cost in

uniform capacity grids

v1

v2

f

Page 11: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

13

Israel Cidon - Technion

Toggle XY (TXY)

Split packets evenly between XY, YX routes

Deadlock avoided with 2 VCs Near-optimal for symmetric

traffic (permutations) [Seo et al. 05; Towles & Dally 02]

Simple Better Balanced Split routes Does not take into account the

traffic pattern

v1

v2

f/2 f/2 f/2

f/2f/2

f/2 f/2 f/2

f/2f/2

Page 12: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

14

Israel Cidon - Technion

Weighted Schemes

TXY not always produces the best results -

0 0.2 0.4 0.6 0.8 110

15

20

25Capacity vs. XY weight on Toggle XY Routing - for (2,1) and (1,1) hotspots

XY fraction

Max c

apac

ity

YX only

XY only

11.94

15

Grid with optimal weight Grid with equal weight

Max. Capacity for graph with two hotspots at (1,1) and (1,2) on 5x5 grid

TXYOptimum

Page 13: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

15

Israel Cidon - Technion

WTXY

Given a traffic pattern, choose XY/YX ratio of lowest maximum capacity

Compute the ratio at programming time Load into Cxy field in router Router chooses XY route with probability

Cxy, otherwise YX

Page 14: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

16

Israel Cidon - Technion

TXY, WTXY Limitation Traffic split

packets of the same flow take different paths Delays may cause out-of-order arrivals Re-ordering buffers are costly

Page 15: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

17

Israel Cidon - Technion

Ordered Routing Algorithms

One route per source-destination (S-D) pairNo traffic splitting

Unordered Routing Ordered Routing

Page 16: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

18

Israel Cidon - Technion

Source Toggle XY

The route is a function of source and destination ID bitwise XOR

Very simple algorithm Maximum capacity is

similar to TXY

XY YX XY YX XY

YX YX XY YX

XY YX XY YX XY

YX XY YX XY YX

XY YX XY YX XY

Page 17: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

19

Israel Cidon - Technion

Weighted Ordered Toggle - WOT

Weighted Ordered Toggle (WOT)Route per S-D pair is chosen at programming

time Each source stores a routing bit for each

destination Objective: minimize max link capacity

Optimal route assignment is difficult

Page 18: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

20

Israel Cidon - Technion

WOT Min-max Route Assignment

initial assignment - STXY Make changes that reduce the capacity:

Find most loaded linkAmong S-D pairs sharing this link change one

that minimizes the max capacity (if possible) Sub-optimal

Page 19: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

21

Israel Cidon - Technion

Iteration Demonstration

S3 S2

S1

D3

D1

D2

Page 20: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

22

Israel Cidon - Technion

Benchmarks Previous work consider uniform

permutations Chips have one or more hotspots

CPU, on-chip memory, off-chip memory interface

We use several hot-spot traffic models Also use a real world example

Page 21: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

23

Israel Cidon - Technion

Single Hotspot

0 5 10 15 200

5

10

15

20

Capacity

Numb

er of

Links

XY

TXY

STXY

WTXY

WOT

CORNER CENTER INTERNAL HOR. EDGE VER. EDGE0

2

4

6

8

10

12

14

16

18

20

Location of the hot spot

Cap

aci

ty

XY

TXY

STXY

WTXY

WOT

Page 22: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

24

Israel Cidon - Technion

Two Hotspots

1 2 3 4 5

15

20

25

30

Minimum Distance between the hotspots

Capa

city

XY

TXY

STXY

WTXY

WOT

Maximum Capacity Design Envelope for various distances

between the hotspots for WOT

Page 23: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

25

Israel Cidon - Technion

Three Hotspots

Maximum capacity vs. Minimum distance between the hotspots

1 2 3 4

20

30

40

Minimum Distance between the hotspots

Cap

aci

ty

XY

TXY

STXY

WTXY

WOT

Page 24: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

26

Israel Cidon - Technion

Mixed Traffic Model

Three parameters per node A probability to be a hotspot, A probability to send data to

a hotspot A probability to send data to

a non-hotspot

Average improvement for WOT vs. TXY is 12% and vs. XT is 25%

5 6 7 8 910

20

30

40

50

60

70

80

90

100

110

Grid Size

Max

. C

Performance for Phs = 0.10

Psend,hs = 0.8000,Psend,no,hs = 0.0500, Nsim = 45

XY

TXY

WTXY

STXY

WOT

Page 25: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

27

Israel Cidon - Technion

Real-World Example

Based on Bertozzi - video encoderMapping and placement are done manually

Page 26: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

28

Israel Cidon - Technion

Real World Example

Maximum Capacity WOT - 1053 STXY -1377 XY - 1539

81 243 405 567 729 891 1053 1215 1377 15390

5

10

15

Capacity

Num

ber

of

Lin

ks

XYYXSTXYWOT

Page 27: Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.

29

Israel Cidon - Technion

Summary

A new NoC-based architecture for FPGA A design methodology for this architecture. WOT routing algorithm –

Balanced In-orderLow cost