Top Banner
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis of Irregular Application Specific NoC's by Ben Meakin
23

Introduction

Jan 14, 2016

Download

Documents

konane

Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis of Irregular Application Specific NoC's by Ben Meakin. Introduction. N1. N2. N3. N1. N2. N3. Irregular vs Regular Networks. N4. N5. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction

Presentation of

Designing Efficient Irregular Networks forHeterogeneous Systems-on-Chip

byChristian Neeb and Norbert Wehn

and

Workload Driven Synthesis of Irregular Application Specific NoC's

byBen Meakin

Page 2: Introduction

Introduction

Irregular vs Regular NetworksN1 N2 N3

N4 N5 N6

N7 N8 N9

N1 N2 N3

N4 N5 N6

N7 N8 N9

Advantages: High performance/low power

for specific application Lower hardware cost of

network Disadvantages:

Difficult routing Difficult floor planning/P&R

Advantages: Efficient for general purpose

applications Easy routing Easy floor planning/P&R

Disadvantages: Expensive Under utilized channels Avg. latency not as scalable

Page 3: Introduction

Background and Motivation

NoC power a function of network hops: Pperflit = (buff + xbar + arb)*Hops + wire*D

NoC latency a function of network hops: L = Rpipeline * Hops (clock cycles)

NoC hardware cost a function of router radix: C = sum (k1*radix_n + k2) n = 1....nodes

Minimize hops and router complexity!

Page 4: Introduction

Design of Efficient Irregular Networks for Heterogeneous Systems-on-Chip

Christian Neeb and Norbert WehnUniversity of Kaiserslautern

9th EUROMICRO Conference on Digital System Design 2006

Page 5: Introduction

Application Communication Model

Resource Communication Graph (RCG) Vertex: hardware component Edge: abstract communication channel Edge Weight: average communication rate (bandwidth)

Normalized to physical bandwidth of one channel link

Efficient mapping of task communication to RCG is assumed and is not discussed in this paper

Page 6: Introduction

INOC Architecture

Nodes are tiles consisting of a chip resource and a router

Routers: Wormhole with virtual channels

Round-Robin switch arbitration

Routing: Deterministic: one output channel per destination

address Adaptive: > one output channel per destination address

Page 7: Introduction

Routing in Irregular Topologies

Channel Dependency Graph (CDG) Vertex: channel Edge: pair of channels with a dependency due to

routing function Deadlock free if there are no cycles

Node Numbering Arbitrary in this paper

Channels Increasing / Channels Decreasing Acyclic CDGs Determined by source/destination node numbers

Deadlock Free Routing Route packets on increasing channels, then decreasing

General form of dimension order routing

Page 8: Introduction

Routing in Irregular Topologies

Unsafe paths allowed provided there is a safe path (acyclic) that a packet can escape to Routing table filled via Dijkstra's algorithm

Virtual channels are used to expand routing function using similar rules as dimension order schemes

Page 9: Introduction

Network Generation

Start with a bidirectional connection of all nodes in ascending order Ensures a correct routing path exists

Add shortcuts based on network traffic estimation and RCG Provides an optimal routing path that will be used in

the common case

Page 10: Introduction

Results

Network traffic patterns randomly generated with constraints to model heterogeneous SoC traffic Simulated with 100 different patterns

Variable flit injections rates

Page 11: Introduction

Limitations

Naive implementation of “safe path” increases cost/complexity of network RCG is the most efficient topology More efficient deadlock free solutions may exist

Efficient mapping of TCG to RCG is assumed Can be a difficult problem Network generation could be linked to a specific

programming model or API for simple TCG to RCG mapping

Page 12: Introduction

Workload Driven Synthesis of Irregular Application-Specific NoC's

Ben [email protected]

Page 13: Introduction

Project Objectives

Synthesis tool for INoC's Generation of optimal network topology MCAPI workload specification Synthesis of deadlock free routing

Comparison of irregular to regular NoC performance, power, and cost

Page 14: Introduction

Motivation

CAD for ASICS always needs improvement Reduce design cost, time-to-market, etc. Synthesis of INoCs could be useful in

implementing parallel algorithms in programmable logic

Average router complexity needs to be minimized Asynch NoCs can realize improvement in average case performance

GALS INoCs could be future direction in heterogeneous SoC

Minimize hops to reduce power and latency

Page 15: Introduction

Specification of Workload

Communication channels:

ScalarChan N1 N2 256e9 10 Comm Type Sender Receiver BW (bits/sec) Priority

Optimization Effort:

E = 100 * ((BW / Norm)/2 + (PR / Norm)/2)

Page 16: Introduction

Network Synthesis Algorithm

Sort channels by priority/effort Add all nodes to network For each channel:

if sender has links: while link has not been added

if receiver has links, add link; else increase search depth else recursive call from next sender neighbor node

while not done if network is not correct, add link; else done

Page 17: Introduction

Network Synthesis Example

N1 N2

Max Radix is 3 N1 – N2 added

Highest priority link

Page 18: Introduction

Network Synthesis Example

N1 N2

N3

N4

N1 – N4 added N1 – N3 added N3 – N4 added

Page 19: Introduction

Network Synthesis Example

N1 N2

N3

N4

N5

N6

N7

N1 – N5 added Max radix of N1 reached, so

N2 used as via

N6 – N7 added N5 – N8 added Are we done?

N8

Page 20: Introduction

Network Synthesis Example

N1 N2

N3

N4

N5

N6

N7

N2 – N6 added for correctness Every node must have a path to every other node

N8

Page 21: Introduction

Synthesis of Routing Function

Rename nodes with naming algorithm Depth first, breadth first, root node selection

Most efficient synthesis for application For each node pair in workload

check if shortest path is correct if not, add a virtual channel to make it correct

Most correct synthesis for general purpose For ALL node pairs in network

check if shortest path is correct if not, add a virtual channel to make it correct

Synthesize ROM for each node

Page 22: Introduction

Uses of this Tool

Incorporate with previous MCAPI hardware implementation Allow user to synthesize a working HDL model based

on MCAPI workloads and a set of IP blocks Modern FPGA's could make this a feasible and cost

effective alternative to fabricating ASIC's

Aid for studying optimal topology for known network traffic

Page 23: Introduction

Questions?