FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations 5/3/2011 Michael K. Papamichael, James C. Hoe, Onur Mutlu [email protected], [email protected], [email protected]Computer Architecture Lab at Our work was supported by NSF. We thank Xilinx and Bluespec for their FPGA and tool donations.
26
Embed
FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency ...users.ece.cmu.edu/~omutlu/pub/papamichael_nocs11_talk.pdf · FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency
Estimator for NoC Modeling in Full-System Simulations
*NoC RTL from http://nocs.stanford.edu/router.html
CALCM Computer Architecture Lab at Carnegie Mellon
FIST Approach
View NoC as set of routers/links Abstract router into black-box Represent by load-delay curves Specific to each router configuration and traffic pattern
R
R
R
R R
R R
R R
NN
N
N
N
NN
N N
R
R
R R
RR
4
N
RStateLogic
BuffersRRLa
ten
cy
Load
CALCM Computer Architecture Lab at Carnegie Mellon
N
N
N
N
N
N
FIST Approach
Treat each hop as a set of load-delay curves Trade-off between model complexity and fidelity
Keep track of load at each node To track router load monitor traffic over window of time
R
R
R
R
R
RN
N
N
R
R
R
5
Late
ncy
Load
CALCM Computer Architecture Lab at Carnegie Mellon
N
N
N
N
N
NN
N
N
FIST in Action
Route packet from source to destination Determine routers that will be traversed
Sum up the delays for each traversed router Index load-delay curves using current load at each router
R
R
R
R
R
R
R
R
R
S
D
packet
delay
6
CALCM Computer Architecture Lab at Carnegie Mellon
Outline
Introduction to FIST FIST-based Network Models Evaluation Related Work & Conclusions
CALCM Computer Architecture Lab at Carnegie Mellon
Outline
Introduction to FIST FIST-based Network Models Evaluation Related Work & Conclusions
CALCM Computer Architecture Lab at Carnegie Mellon
Fee
db
ack
Putting FIST Into Context
Train Curves Use Curves
Network models within full-system simulators Model network within a broader simulated system
Assign delay to each packet traversing the network
Typically study networks under synthetic traffic patterns
9
CALCM Computer Architecture Lab at Carnegie Mellon
Offline and Online FIST
Offline FIST Detailed network simulator generates curves offline Can use synthetic or actual workload traffic Load curves into FIST and run experiment
Detailed Network Model
Online FIST (tolerates dynamic changes in network behavior) Initialization of curves same as offline Periodically run detailed network simulator on the side Compare accuracy and, if necessary, update curves
Detailed Network Model
Provide feedback and receive updated curves
10
CALCM Computer Architecture Lab at Carnegie Mellon
Online Training in Action
El
05
101520253035404550
0
66
13
2
19
8
26
4
33
0
39
6
46
2
18
84
19
50
20
16
20
82
21
48
22
14
22
80
23
46
24
12
24
78
Late
ncy
Ellapsed Cycles (in 1000s)
Actual Latency
Estimated Latency
El
…
…
…
Elapsed cycles (in 1000s)
Example with no initial training
Before Training
After Training
CALCM Computer Architecture Lab at Carnegie Mellon
FIST Applicability
“FIST-Friendly” Networks
Exhibit stable, predictable behavior as load fluctuates
Actual traffic similar to training traffic
FIST Limitations
Depends on fidelity, representativeness of training models
Higher loads and large buffers can limit FIST’s accuracy
High network load increased packet latency variance
Large buffers increased range of observed packet latencies
CALCM Computer Architecture Lab at Carnegie Mellon
Applying FIST to NoCs
NoCs affected by on-chip limitations and scarce resources
Employ simple routing algorithms
Usually simple deterministic routing
Operate at low loads
NoCs usually over-provisioned to handle worst-case
Have been observed to operate at low injection rates
Small buffers
On-chip abundance of wires reduces buffering requirements
Amount of buffering in NoCs is limited or even eliminated
NoCs are “FIST-Friendly”13
CALCM Computer Architecture Lab at Carnegie Mellon
Outline
Introduction to FIST FIST-based Network Models Evaluation Related Work & Conclusions
CALCM Computer Architecture Lab at Carnegie Mellon
FIST Implementations
PacketDescriptors
Router Elements
Src
Dest
Size
CalculateDelay
Pickrouters
Routing Logic
Packet
Delays
Partial DelaysB
RA
M
Load Tracker
Curve
Software Implementation of FIST (written in C++) Implements online and offline FIST models
Hardware Implementation (written in Bluespec) Precisely replicates software-based FIST Block diagram of architecture
15
Handle load tracking
& delay queries
CALCM Computer Architecture Lab at Carnegie Mellon
Peeking Under The Hood
Similar issues arise for load tracking & dynamic training
Tracking Latency T 567 4 23 H
R0
S D
R1 R2
Wormhole-routed
Injection latencyTraversal latencies
?
Packet Latency = 30
Latency =
Latency =
Latency =
4 5 6 7 TH
TH
0 10 20 30
TH
5 15 25
2 3
4 5 6 72 3
52 3
R0R1R2 4
R0R1R2
Store-and-forwardPacket Latency = 30
Latency = 9
Latency = 14
Latency = 7
4 5 6 7 TH
TH
0 10 20 30
TH
5 15 25
2 3
4 5 6 72 3
5 6 742 3
R0R1R2
R0R1R2
Use separate injection and traversal latency curves per router
CALCM Computer Architecture Lab at Carnegie Mellon
Methodology
Examined online and offline FIST models Replaced cycle-accurate NoC model in tiled CMP simulator
Network and system configuration 4x4, 8x8, 16x16 wormhole-routed mesh Each network node hosts core+coherent L1 and a slice of L2
Multiprogrammed and multithreaded workloads 26 SPEC CPU2006 benchmarks of varying network intensity 8 SPLASH-2 and 2 PARSEC workloads
Traffic generated by cache misses
Consists of control, data and coherence packets
Offline and Online FIST models with two curves per router Curves represent injection and traversal latency at each router Initial training using uniform random synthetic traffic
Please see paper for more details!17
CALCM Computer Architecture Lab at Carnegie Mellon
Accuracy Results (offline)
8x8 mesh using FIST offline model
Average Latency and Aggregate IPC Error
-15%
-10%
-5%
0%
5%
10%
15%Latency ErrorIPC Error
MT (SPL/PAR)MP (High)MP (Med)MP (Low)
Late
ncy
/IP
CEr
ror
(in
%)
IPC Error < 4%
Latency Error < 8%
18
CALCM Computer Architecture Lab at Carnegie Mellon
Accuracy Results (online)
-0.1
-0.05
0
0.05
0.1
Late
ncy
Err
or
Latency Error
IPC Error
MT (SPL/PAR)MP (High)MP (Med)MP (Low)
8x8 mesh using FIST online model
Average Latency and Aggregate IPC Error
Both Latency and IPC Error below 3%
19
Late
ncy
/IP
CEr
ror
(in
%)
-10%
-5%
0%
5%
10%
CALCM Computer Architecture Lab at Carnegie Mellon
What about a very simple model?
Very high error for both latency and IPC!20
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Late
ncy
Erro
r
Latency Error
IPC Error
MT (SPL/PAR)MP (High)MP (Med)MP (Low)
Late
ncy
/IP
CEr
ror
(in
%)
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Late
ncy
Erro
r
Latency Error
IPC ErrorLatency ErrorIPC Error
-10%
-20%
0%
20%
40%
60%
80%
-40%
-60%
-80%
8x8 mesh using hop-based model How does simple network model affect high-order results?
FIST models always within this range
CALCM Computer Architecture Lab at Carnegie Mellon