Top Banner
Zhiliang Qian 1 , Da-Cheng Juan 2 , Paul Bogdan 3 , Chi-Ying Tsui 1 , Diana Marculescu 2 and Radu Marculescu 2 1 The Hong Kong University of Science and Technology, Hong Kong 2 Carnegie Mellon University, Pittsburgh, U.S.A 3 University of Southern California, Los Angeles, U.S.A A Comprehensive and Accurate Latency Model for Network-on-Chip Performance Analysis IEEE/ACM ASP-DAC’14, 22 Jan., 2014, Singapore
30

A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Feb 28, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Zhiliang Qian1, Da-Cheng Juan2, Paul Bogdan3, Chi-Ying Tsui1,

Diana Marculescu2 and Radu Marculescu2

1The Hong Kong University of Science and Technology, Hong Kong 2Carnegie Mellon University, Pittsburgh, U.S.A

3University of Southern California, Los Angeles, U.S.A

A Comprehensive and Accurate Latency Model for

Network-on-Chip Performance Analysis

IEEE/ACM ASP-DAC’14, 22 Jan., 2014, Singapore

Page 2: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Outline

Introduction

NoC Modeling for Performance Analysis

NoC end-to-end delay calculation

Link dependency analysis

GE-type traffic modeling

Wormhole router based NoC latency model

Experimental results

Simulation setup

Evaluation under synthetic traffic patterns

Evaluation under realistic benchmarks

Conclusion

2

Page 3: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Network-on-Chips (NoCs)

With technology scaling down, more and more components

can be integrated on a single chip.

3

Cell

Cell

Cell

Cell

Cell

Cell

FB FB FB

FB FB

IP

IP

IP

IP

IP

IP

NoC

1.0um 0.25um <0.05um

Point-to-Point

interconnectShared Bus

Communication

Network

Total wire

length<100cm <100 meters >1Km

An efficient way to manage the communication of on-chip

resources plays the key role in future system design.

Page 4: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

NoC design space exploration

A large design space needs to be explored for an optimal design

Task mapping, allocation, buffer sizing, routing algorithm etc.

Accurate and fast performance evaluation is required during the exploration

-> analytical performance evaluation model

4

NoC Platform

(topology, router

design)

Application

(traffic pattern,

injection rate)

Task

scheduling/

mapping

Router model

Performance

evaluation

Design space

exploration

Inner loop

Simulation/

Prototyping

Outer loopDetailed performance

evaluation

Core

mapping

Routing

allocation

Traffic analysis

vld RLD

Inverse

Scan

AC/DC

iQuant

Stripe

memory

idct

Up

samp

AR

M

Vop

paddingVOP

70

36

2

362

362

27

49

313

94

500313

16

30

0

353

357

16

Performance

feedback

Page 5: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Introduction- queuing-theory-based

analytical model

Queuing-theory-based delay estimation

Customer (packet) arrival process

System (server) service process

Number of servers

Service discipline (FCFS, Round-robin etc.)

System time and waiting time

5

Queue Server

A

A

Waiting time

A

Service time

System time

Bernoulli injection process

Pro

ba

bili

ty d

en

sity

Time 1/T

f(𝑥𝑡)=𝜆𝑒−𝜆𝑥𝑡 (λ =

1

𝑇)

Page 6: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Queuing-theory-based NoC latency model

Previous arts and motivation of this work

6

NoC

latency

model

Previous NoC analytical models This work

[VLSI 2007] [TCAD’12,

ICCAD’09]

[TVLSI’13] [NoCs’11]

Traffic model for the application

Queue M/M/1 M/G/1/K G/G/1 M/M/m/K G/G/1/K

Arrival Poisson Poisson General Poisson General

Service Markov General General Markov General

NoC architecture modeled

Buffer Small 𝐾 packets 𝐵 flits Small 𝐵 flits

PB ratio1 𝑚 (≫ 1) < 1 arbitrary 𝑚 (≫ 1) arbitrary

Arbitration Round robin Round robin Fixed priority Round robin Round robin

1 PB ratio is defined as the ratio of average packet size (𝑚 flits) to the buffer depth (𝐵 flits)

Page 7: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Outline

Introduction

NoC Modeling for Performance Analysis

NoC end-to-end delay calculation

Link dependency analysis

GE-type traffic modeling

Wormhole router based NoC latency model

Experimental results

Simulation setup

Evaluation under synthetic traffic patterns

Evaluation under realistic benchmarks

Conclusion

7

Page 8: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Input to the NoC latency model

The application has been scheduled and mapped onto the NoC.

A deterministic routing algorithm is used to avoid deadlock.

8

T1

T3

T4 T5 T6

T7

T8

T9

T10T11

T2

Task scheduling

and allocation

T11

T2

T3

T4 T5 T6

T7

T8

T9

T10T1

Mapping and

Floorplan

Routing

algorithm

design

T1

T3

Path 1

Path 2

Path 3

Input to the

latency

model

Tile A

Tile B

Tile A

Tile B

𝑃𝑓: Set of links in the

routing path

(𝜆𝑓 , 𝐶𝑓): GE-type traffic

model

Performance metrics: average latency 𝐿 = ( 𝜆𝑓𝑓∈𝐹 × 𝐿𝑓)/ 𝜆𝑓𝑓∈𝐹

Page 9: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

NoC end-to-end delay calculation

The end-to-end flow latency 𝑳𝒔,𝒅 of a specific flow 𝒇𝒔,𝒅

consists of three parts: 𝐿𝑠,𝑑 = 𝑣𝑠 + 𝜂𝑠,𝑑 + ℎ𝑠,𝑑

The queuing time at the source 𝑣𝑠

The packet transfer time in the path 𝜂𝑠,𝑑 = 𝑚 + 1 + 𝜂𝑙𝑓𝑖𝑑𝑓𝑖=1

The path acquisition time ℎ𝑠,𝑑 = ℎ𝑙𝑓𝑖𝑑𝑓𝑖=1

9

0 1 2

3 4 5

6 7 8

8

4

1

7

0

f0,8

v0

Local(a) (b)

5

l1

f

l2

f

l3

f

l4

f

l5

f (c)

R1 l2

hl2

ηl2

R2

current flow contention flow

ql2

R1 l2

hl2

ηl2

R2

ql2

(d)

Page 10: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Outline

Introduction

NoC Modeling for Performance Analysis

NoC end-to-end delay calculation

Link dependency analysis

GE-type traffic modeling

Wormhole router based NoC latency model

Experimental results

Simulation setup

Evaluation under synthetic traffic patterns

Evaluation under realistic benchmarks

Conclusion

10

Page 11: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Link dependency graph

Building the channel (link) dependency graph

11

P1P0 P2

P3 P4 P5

P7P6 P8

L0_1 L1_2

L1_0 L2_1

L0_3 L3_0 L1_4 L4_1 L2_5 L5_2

L3_6 L6_3

L3_4

L4_3

L4_5

L5_4

L6_7

L7_6

L7_8

L8_7

C7 C3

C2

C4

C6

L1_4

L4_5

L5_2

L2_1

L1_2L2_5

L5_4

L4_1

L4_7

L7_6

L7_4

L3_6

L6_7L4_3L3_4

L6_3

Core Communication graph NOC mesh Channel Dependency Graph (CDG)

of the application communication

L4_7 L7_4 L5_8 L8_5

Routing path Traffic model

Edge in the

dependency graph

Page 12: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Link dependency analysis

12

Topological sort algorithm is applied on the obtained CDG to

find out the proper order to analyze the queuing delays. A sample channel dependency graph

Three

parameters:

𝜂, ℎ, 𝑠

Source

queuing 𝑣

Page 13: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Outline

Introduction

NoC Modeling for Performance Analysis

NoC end-to-end delay calculation

Link dependency analysis

GE-type traffic modeling

Wormhole router based NoC latency model

Experimental results

Simulation setup

Evaluation under synthetic traffic patterns

Evaluation under realistic benchmarks

Conclusion

13

Page 14: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Modeling the bursty traffic input

Generalized exponential (GE) distribution

14

1-τ

Packet generating point

Exponential branch

Direct branch 10 20 30 40 50 60 70 80 90 1000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Time horizontal index

Pa

ck

et

inje

cti

on

ra

te (

pa

ck

ets

/cy

cle

)

Poisson injection (CV=1)

GE-type injection (CV=3)

GE-type injection (CV=4)

Page 15: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

GE-type traffic modeling

The GE-type cumulative distribution function (cdf) of inter-

arrival time is: 𝐹 𝑡 = 𝑃 𝑋 ≤ 𝑡 = 1 − 𝜏𝑒−𝜏𝜆𝑡, 𝑡 ≥ 0

Where the parameter 𝜏 =2

1+𝐶2 and 𝐶2 is the square coefficient of variation

In this work, we use the GE distribution to model the traffic input of

each flow, which is characterized by two parameters:

𝜆 : the average packet arrival rate (packets/cycle)

𝐶: the coefficient of variation of this traffic flow, i.e., 𝐶 =𝜎

𝜆, where 𝜎 is the

standard derivation of the packet inter-arrival times.

Accordingly, the GE/G/1/K queuing model is used to analyze

the channel waiting time by considering the traffic burstness.

15

Page 16: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Outline

Introduction

NoC Modeling for Performance Analysis

NoC end-to-end delay calculation

Link dependency analysis

GE-type traffic modeling

Wormhole router based NoC latency model

Experimental results

Simulation setup

Evaluation under synthetic traffic patterns

Evaluation under realistic benchmarks

Conclusion

16

Page 17: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Flit transfer time 𝜼 calculation

The flit transfer time 𝜼 of link 𝒍𝒂𝒃 is defined as the time taken

for the header flit after being granted link access to reach the

buffer front in link 𝒍𝒂𝒃

17

S S S S

current link li

CrossbarA Packet

(m flits)

ηli

contention flow

Mean flit rate arriving at the buffer is: 𝜆𝑙𝑎𝑏 = 𝑚 × 𝜆𝑝𝑎𝑐𝑘𝑒𝑡𝑙𝑎𝑏 = 𝑚 × 𝜆𝑓𝑓∈𝐹𝑙𝑎𝑏

Mean time to serve a flit in this queuing system is the weighted average of the service

time from all flows passing through 𝑙𝑎𝑏: 𝑠𝑓𝑙𝑖𝑡𝑙𝑎𝑏 =

𝜆𝑓×

ℎ𝑙𝑖+1𝑓

𝑚+1∀𝑓∈𝐹𝑙𝑎𝑏

𝜆𝑓∀𝑓∈𝐹𝑙𝑎𝑏

Any flow

𝑓 passing

the link

𝑙𝑎𝑏

Mean

service

time for

flow 𝑓

Page 18: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Illustration of path acquisition time

Service time in wormhole NoC to obtain 𝒉:

18

H

S

W

Finte size buffer

Q

S

Link contention

M/G/1/K+v queue

Waiting time W includes: 1) the link contention time H and 2) the time for

the header flit to reach the buffer head Q

Service time S is bounded by the time where the header reaches the node that

the accumulated buffer spaces between can hold the whole worm packet.

S S S S

current link li

CrossbarA Packet

(m flits)

Point A

Point B

ηliηli+1+hli+1 ηli+2+hli+2 zli+3hli+3

Point C

contention flow

Point D

Page 19: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Path acquisition time 𝒉 calculation

Number of effective subsequent links of link 𝒍 with respect to the path 𝒑:

𝛬𝑝 𝑙 =

𝑚

𝐵𝑟 𝑝, 𝑙

𝑖𝑓 𝑟 𝑝, 𝑙 >𝑚

𝐵𝑒𝑙𝑠𝑒𝑤𝑖𝑠𝑒

where 𝑟(𝑝, 𝑙) is the function returns the number of remaining hops from link 𝑙 towards the

destination of path 𝑝.

The service time of link 𝒍 with respect to the path 𝒑 is [1]:

𝑠𝑙𝑖𝑓 =

𝑚 𝑚 + 𝑥𝑙𝑖𝑓+ 2𝑥𝑙𝑖

𝑓𝑚 /(𝑚 + 2𝑥𝑙𝑖

𝑓) 𝑖𝑓 𝑥𝑙𝑖

𝑓< 𝑚

𝑚 𝑚+ 𝑥𝑙𝑖𝑓+ 2 𝑥𝑙𝑖

𝑓2/(𝑚 + 2𝑥𝑙𝑖

𝑓) 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

The channel service time of link 𝒍 :

𝑠 𝑙𝑎𝑏 = (𝜆𝑓 × 𝑠𝑙𝑖𝑓)/∀𝑓∈𝐹𝑙𝑎𝑏

𝜆𝑓∀𝑓∈𝐹𝑙𝑎𝑏

19

𝐶𝑠𝑙𝑎𝑏2 =

𝑠𝑙𝑎𝑏2

𝑠 𝑙𝑎𝑏2 − 1 = (

𝜆𝑓 × 𝑠𝑙𝑖𝑓2

∀𝑓∈𝐹𝑙𝑎𝑏

𝜆𝑓∀𝑓∈𝐹𝑙𝑎𝑏

)/ 𝑠 𝑙𝑎𝑏2− 1

[1] P.-C. Hu, L. Kleinrock, An Analytical model for wormhole routing with finite size input Buffers,

15th International Telegraphic Congress, 1998

Page 20: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

GE/G/1/K queue based 𝒉 calculation

Diffusion approximation for the steady state distribution probability 𝑷𝒏 of the

M/G/1/K queue with arrival rate 𝝀 and service rate 𝝁 :

𝑃𝑛 =

𝑐 × 𝑝′𝑛 (0 ≤ 𝑛 ≤ 𝐾)

1 −1−𝑐 1−

𝜆

𝜇

𝜆

𝜇

(𝑛 = 𝐾 + 1)

Where the normalization constant 𝑐 = (1 −𝜆

𝜇(1 − 𝑃𝑛))

−1𝐾𝑗=0 and 𝑝′𝑛 is the steady

state probability of M/G/1/∞ queue [2]

Applying Little’s formula to obtain the waiting time : ℎ𝑙′ = ( 𝑖 × 𝑃𝑖)/

𝐾+1𝑖=1 𝜆

Taking the arrival traffic burstiness in GE/G/1/K model by refining the results of

M/G/1/K queue:

20

ℎ𝑙𝑎𝑏 =(𝐶𝑠𝑙𝑎𝑏

2 +𝐶𝑎𝑙𝑎𝑏2 )

(1+𝐶𝑠𝑙𝑎𝑏2 )

ℎ′𝑙𝑎𝑏

[2] M.C. Lai, et.al. An accurate and efficient performance analysis approach based on queuing

model for Network on Chip. In Proceedings of ICCAD,2009

Page 21: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Source queuing time 𝒗𝒔

The source queue is modeled as a GE/G/1/∞ system:

𝑣𝑠 = 𝑠 𝑙𝑠2

1 +

𝐶𝑎2+𝜆𝑎 ×

𝑠 𝑙𝑠 −𝑚2

𝑠 𝑙𝑠1 − 𝜆𝑎 × 𝑠 𝑙𝑠

− 𝑠 𝑙𝑠

where the arrival process is characterized by (λa, Ca2) in the GE type traffic

model and the service time at source is represented as 𝑠 𝑙𝑠 .

21

Packet

source

Input timing

Infinite

source buffer

Output timing

Network

on chips

Traffic terminal (source and sink)

Traffic terminal (source and sink)

Open loop source queuing

measurement setup

Application traffic generation trace

Page 22: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Proposed NoC latency analysis flow

22

1: foreach 𝒍𝒂𝒃 ∈ 𝑮

2: (𝝀𝒍𝒂𝒃 , 𝑪𝒂𝒍𝒂𝒃𝟐 )= traffic_model (𝑭𝒍𝒂𝒃)

3: 𝜼𝒍𝒂𝒃= calculate_transfer_time(𝝀𝒍𝒂𝒃,𝒔𝒇𝒍𝒊𝒕𝒍𝒂𝒃 ,𝒎, 𝑩)

4: foreach 𝒇 ∈ 𝑭𝒍𝒂𝒃 and 𝒍𝒊𝒇= 𝒍𝒂𝒃

5: 𝒔𝒍𝒊𝒇

= calculate_link_service_time ( )

6: end

7: ( 𝒔 𝒍𝒂𝒃 , 𝑪𝒔𝒍𝒂𝒃𝟐 ) = service_time ( )

8: if 𝒂 ≠ 𝒃 // the links between the routers

9: 𝒉𝒍𝒂𝒃 = GE_G_1_K_queue (𝝀𝒍𝒂𝒃 , 𝑪𝒂𝒍𝒂𝒃𝟐 , 𝒔 𝒍𝒂𝒃 , 𝑪𝒍𝒂𝒃

𝟐 , 𝒌)

10: else // the link is the source link

11: 𝒗𝒂 = GE_G_1 _queue (𝝀𝒍𝒂𝒃 , 𝑪𝒂𝒍𝒂𝒃𝟐 , 𝒔 𝒍𝒂𝒃 , 𝑪𝒔𝒍𝒂𝒃

𝟐 )

12: endif

13:endfor

14: foreach 𝒇 ∈ 𝑭

15: 𝑳𝒔,𝒅=calculate_flow_latency( )

16: end

Link dependency

analysis to obtain the

link order 𝑮

For each link 𝒍𝒂𝒃 in 𝑮:

Calculate the flit transfer

time 𝜂

Calculate the link service

time 𝑠

Compute the path

acquisition time ℎ

Calculate the source

queuing time 𝑣

Form the latency for

each flow in application

Page 23: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Outline

Introduction

NoC Modeling for Performance Analysis

NoC end-to-end delay calculation

Link dependency analysis

GE-type traffic modeling

Wormhole router based NoC latency model

Experimental results

Simulation setup

Evaluation under synthetic traffic patterns

Evaluation under realistic benchmarks

Conclusion

23

Page 24: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Simulation setup

The proposed analytical latency model is implemented in

MATLAB and its accuracy is compared with Booksim simulator.

Each router takes two cycles to route a flit and the link traversal

stage takes an additional one cycle.

Different buffer depth (𝑩 flits) and packet length (𝒎 flits) combinations are evaluated.

Both synthetic and real applications are adopted:

Random and shuffle traffic on 8 × 8 and 12 × 12 meshes

MMS (Multimedia system)

DVOPD (Video object plane decoder)

MPEG4 (MPEG decoder)

SPECweb99 applications

24

Page 25: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Evaluation under random traffic patterns

The proposed latency model works for a variety of buffer depth

and packet size combinations.

For random traffic, about 5.2%-9.9% errors are introduced in

predicting the network saturation point. 25

0 0.01 0.02 0.03 0.04 0.050

100

200

300

400

500

600

700

800

Packet injection rate (packets/cycle)

La

ten

cy

(c

yc

les

)

Simulation (B=3,m=14)

Proposed model (B=3,m=14)

Simulation (B=4,m=9)

Proposed model (B=4,m=9)

Simulation(B=9,m=4)

Proposed model (B=9,m=4)

0 0.01 0.02 0.03 0.04 0.05 0.06 0.070

100

200

300

400

500

600

700

800

Packet injection rate (packets/cycle)

La

ten

cy

(c

yc

les

)

Simulation (B=3,m=14)

Proposed model (B=3,m=14)

Simulation (B=4,m=9)

Proposed model (B=4,m=9)

Simulation(B=9,m=4)

Proposed model (B=9,m=4)

8 × 8 𝑚𝑒𝑠ℎ, 𝑅𝑎𝑛𝑑𝑜𝑚 12 × 12 𝑚𝑒𝑠ℎ, 𝑅𝑎𝑛𝑑𝑜𝑚

Page 26: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Evaluation under shuffle traffic patterns

For the traffic patterns such as shuffle, a little larger error

(10.8%-13%) is introduced due to the uneven traffic arrival

rates across the channels.

Overall, the analytical model achieves 70X speedup over the

simulations for both traffic patterns. 26

0 0.002 0.004 0.006 0.008 0.01 0.0120

100

200

300

400

500

600

700

800

Packet injection rate (packets/cycle)L

ate

nc

y (

cy

cle

s)

Simulation (B=3,m=14)

Proposed model (B=3,m=14)

Simulation (B=4,m=9)

Proposed model (B=4,m=9)

Simulation(B=9,m=4)

Proposed model (B=9,m=4)

0 0.01 0.02 0.03 0.04 0.050

100

200

300

400

500

600

700

800

Packet injection rate (packets/cycle)

La

ten

cy

(c

yc

les

)

Simulation (B=3,m=14)

Proposed model (B=3,m=14)

Simulation (B=4,m=9)

Proposed model (B=4,m=9)

Simulation(B=9,m=4)

Proposed model (B=9,m=4)

8 × 8 𝑚𝑒𝑠ℎ, 𝑠ℎ𝑢𝑓𝑓𝑙𝑒 12 × 12 𝑚𝑒𝑠ℎ, 𝑠ℎ𝑢𝑓𝑓𝑙𝑒

Page 27: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Evaluation under burst and real traffic

Comparison of Poisson and GE-type traffic injection:

Evaluation under real application traces:

27

10 20 30 40 50 60 70 80 90 1000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Time horizontal index

Pa

ck

et

inje

cti

on

ra

te (

pa

ck

ets

/cy

cle

)

Poisson injection (CV=1)

GE-type injection (CV=3)

GE-type injection (CV=4)

0.01 0.015 0.02 0.025 0.03 0.035 0.040

200

400

600

800

1000

Packet injection rate (packets/cycle)

Late

ncy (

cycle

s)

Simulation (Poisson injection, CV=1)

Simulation (GE-type injection, CV=3)

Simulation (GE-type injection, CV=4)

Proposed model (Poisson injection, CV=1)

Proposed model (GE-type injection, CV=3)

Proposed model (GE-type injection, CV=4)

0 5 10 15 20 25 300

10

20

30

40

50

60

Flow index in DVOPD application

En

d-t

o-e

nd

dela

y (

cycle

s)

Simulation

Proposed model

DVOPD VOPD MMS MPEG4 Apache Ocean Oracle DVB2 Sparse0

10

20

30

40

50

60

70

80

En

d-t

o-e

nd

dela

y (

cycle

s)

Simulation

Proposed model

Page 28: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Outline

Introduction

NoC Modeling for Performance Analysis

NoC end-to-end delay calculation

Link dependency analysis

GE-type traffic modeling

Wormhole router based NoC latency model

Experimental results

Simulation setup

Evaluation under synthetic traffic patterns

Evaluation under realistic benchmarks

Conclusion

28

Page 29: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Conclusion

In this work, we propose a new NoC latency model which

generalizes the previous work by modeling:

The arrival traffic burstiness

The general service time distribution

The finite buffer depth and arbitrary packet length combinations

A link dependency analysis technique is proposed to

determine the order of applying queuing analysis

The accuracy of the model is demonstrated using both the

synthetic traffic and real applications.

A 70X speedup over simulation is achieved with less than 13%

error in the proposed analytical model, which benefit the NoC

synthesis process.

29

Page 30: A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y

Thank you!!

Q&A

30