Zhiliang Qian 1 , Da-Cheng Juan 2 , Paul Bogdan 3 , Chi-Ying Tsui 1 , Diana Marculescu 2 and Radu Marculescu 2 1 The Hong Kong University of Science and Technology, Hong Kong 2 Carnegie Mellon University, Pittsburgh, U.S.A 3 University of Southern California, Los Angeles, U.S.A A Comprehensive and Accurate Latency Model for Network-on-Chip Performance Analysis IEEE/ACM ASP-DAC’14, 22 Jan., 2014, Singapore
30
Embed
A Comprehensive and Accurate Latency Model for Network-on ......mappi ng Rout i ng al l ocat i on T r af f i c anal ysi s vl d RL D I nver se Scan A C/DC i Quant St r i pe memor y
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Zhiliang Qian1, Da-Cheng Juan2, Paul Bogdan3, Chi-Ying Tsui1,
Diana Marculescu2 and Radu Marculescu2
1The Hong Kong University of Science and Technology, Hong Kong 2Carnegie Mellon University, Pittsburgh, U.S.A
3University of Southern California, Los Angeles, U.S.A
A Comprehensive and Accurate Latency Model for
Network-on-Chip Performance Analysis
IEEE/ACM ASP-DAC’14, 22 Jan., 2014, Singapore
Outline
Introduction
NoC Modeling for Performance Analysis
NoC end-to-end delay calculation
Link dependency analysis
GE-type traffic modeling
Wormhole router based NoC latency model
Experimental results
Simulation setup
Evaluation under synthetic traffic patterns
Evaluation under realistic benchmarks
Conclusion
2
Network-on-Chips (NoCs)
With technology scaling down, more and more components
can be integrated on a single chip.
3
Cell
Cell
Cell
Cell
Cell
Cell
FB FB FB
FB FB
IP
IP
IP
IP
IP
IP
NoC
1.0um 0.25um <0.05um
Point-to-Point
interconnectShared Bus
Communication
Network
Total wire
length<100cm <100 meters >1Km
An efficient way to manage the communication of on-chip
resources plays the key role in future system design.
NoC design space exploration
A large design space needs to be explored for an optimal design
Task mapping, allocation, buffer sizing, routing algorithm etc.
Accurate and fast performance evaluation is required during the exploration
-> analytical performance evaluation model
4
NoC Platform
(topology, router
design)
Application
(traffic pattern,
injection rate)
Task
scheduling/
mapping
Router model
Performance
evaluation
Design space
exploration
Inner loop
Simulation/
Prototyping
Outer loopDetailed performance
evaluation
Core
mapping
Routing
allocation
Traffic analysis
vld RLD
Inverse
Scan
AC/DC
iQuant
Stripe
memory
idct
Up
samp
AR
M
Vop
paddingVOP
70
36
2
362
362
27
49
313
94
500313
16
30
0
353
357
16
Performance
feedback
Introduction- queuing-theory-based
analytical model
Queuing-theory-based delay estimation
Customer (packet) arrival process
System (server) service process
Number of servers
Service discipline (FCFS, Round-robin etc.)
System time and waiting time
5
Queue Server
A
A
Waiting time
A
Service time
System time
Bernoulli injection process
Pro
ba
bili
ty d
en
sity
Time 1/T
f(𝑥𝑡)=𝜆𝑒−𝜆𝑥𝑡 (λ =
1
𝑇)
Queuing-theory-based NoC latency model
Previous arts and motivation of this work
6
NoC
latency
model
Previous NoC analytical models This work
[VLSI 2007] [TCAD’12,
ICCAD’09]
[TVLSI’13] [NoCs’11]
Traffic model for the application
Queue M/M/1 M/G/1/K G/G/1 M/M/m/K G/G/1/K
Arrival Poisson Poisson General Poisson General
Service Markov General General Markov General
NoC architecture modeled
Buffer Small 𝐾 packets 𝐵 flits Small 𝐵 flits
PB ratio1 𝑚 (≫ 1) < 1 arbitrary 𝑚 (≫ 1) arbitrary
Arbitration Round robin Round robin Fixed priority Round robin Round robin
1 PB ratio is defined as the ratio of average packet size (𝑚 flits) to the buffer depth (𝐵 flits)
Outline
Introduction
NoC Modeling for Performance Analysis
NoC end-to-end delay calculation
Link dependency analysis
GE-type traffic modeling
Wormhole router based NoC latency model
Experimental results
Simulation setup
Evaluation under synthetic traffic patterns
Evaluation under realistic benchmarks
Conclusion
7
Input to the NoC latency model
The application has been scheduled and mapped onto the NoC.
A deterministic routing algorithm is used to avoid deadlock.