INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler
Feb 22, 2016
INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY
HAsim On-Chip Network ModelConfiguration
Michael Adler
INTEL CONFIDENTIAL2
The Front End Multiplexed
FET
BranchPred
IMEM PCResolve
InstQ
I$
ITLB1 1 1 0
1
2
0
0first
deq
slot
enqor
drop
1
fault
mispred
1training
pred
rspImm
rspDel
1
1redirect
1vaddr
(from Back End)
vaddr
0
(from Back End)
paddr
0paddr
1
LinePred
00
instor
fault
Legend: Ready to simulate?
CPU1No CPU
2
FET IMEM
INTEL CONFIDENTIAL3
On-Chip Networks in a Time-Multiplexed World
INTEL CONFIDENTIAL4
Problem: On-Chip Network
CPUL1/L2 $
msg credit
Memory Control
rr r r
[0 1 2] [0 1 2]
CPU 0L1/L2 $
CPU 1L1/L2 $
CPU 2L1/L2 $
r
router
msg msg
credit credit
• Problem: routing wires to/from each router• Similar to the “global controller” scheme• Also utilization is low
INTEL CONFIDENTIAL5
Router0..3
Multiplexing On-Chip Network Routers
Router3
Router0
Router2
Router1
cur to 1 to 2 to 3 fr 1 fr 2 fr 30123
0
001
1
1 2 3
2
2 33
reorder
reorder
reorder
σ(x) = (x + 1) mod 4
σ(x) = (x + 2) mod 4
σ(x) = (x + 3) mod 4
1 2 3
0
001
12
2 33
Simulate the network without a network
INTEL CONFIDENTIAL6
On-Chip Network Model Multiplexed Topology
L2 Coherence LLC Hub
OCN_LANE_RECV_Core_1 portDataEnq, 1
OCN_LANE_RECV_Core_2 portDataEnq, 1
CorePvtCache_to_UncoreQ portDataEnq, 1
Uncore_to_CorePvtCacheQ__cred, 1CorePvtCache_to_UncoreQ cred, 1
Uncore_to_CorePvtCacheQ__portDataEnq, 1
Core_OCN_Connection
OCN_LANE_RECV_Core_0 cred, 1
OCN_LANE_RECV_Core_1 cred, 1
OCN_LANE_RECV_Core_2 cred, 1
OCN_LANE_SEND_Core_0 portDataEnq, 1
OCN_LANE_SEND_Core_1 portDataEnq, 1
OCN_LANE_SEND_Core_2 portDataEnq, 1OCN_LANE_RECV_Core_0 portDataEnq, 1
LLC
LLC_to_MEM_req cred,1
MEM_to_LLC_rsp portDataEnq, 1
LLCHub_to_LLC_req__portDataEnq, 1
LLC_to_LLCHub_rsp cred, 1
Mesh Network
mesh_interconnect_credit_E, 1
mesh_interconnect_enq_W, 1mesh_interconnect_enq_S, 1mesh_interconnect_enq_N, 1mesh_interconnect_enq_E, 1
mesh_interconnect_credit_W, 1mesh_interconnect_credit_S, 1mesh_interconnect_credit_N, 1
Core_OCN_Connection_InQ_credit, 1Core_OCN_Connection_InQ_enq, 1
MemoryController
ocn_to_memctrl_credit, 1
ocn_to_memctrl_enq, 1
OCN_LANE_SEND_Core_0 cred, 1
OCN_LANE_SEND_Core_1OCN_LANE_SEND_Core_2
cred, 1cred, 1
Core_OCN_Connection_OutQ_credit, 1
Core_OCN_Connection_OutQ_enq, 1
LLC_to_MEM_req portDataEnq, 1MEM_to_LLC_rsp cred, 1
LLCHub_to_LLC_req__cred, 1LLC_to_LLCHub_rsp portDataEnq, 1
memctrl_to_ocn_credit, 1memctrl_to_ocn_enq, 1
INTEL CONFIDENTIAL7
HAsim’s Network Model is Abstract
• In a software model the target network can be built at run-time• Dynamism is expensive in FPGAs and recompilation is slow• Solution: Constrained dynamism
– Fixed parameters: Max nodes, max edges per node, max VCs– Dynamic:
• Number of active contexts (nodes)• Endpoints of each edge (indirection table)• Routing table• Address mapping of distributed LLC
INTEL CONFIDENTIAL8
Topology Manager
• Software – runs once at startup so no need to optimize• HASIM_CHIP_TOPOLOGY_CLASS:
– Manages streaming of parameters to the FPGA– Iterates over all software topology mapping classes until convergence
• Namespace defined by dictionaries– .dic files are preprocessed by LEAP tools– Hierarchy of enumerated types
INTEL CONFIDENTIAL9
How do I…
• Map address ranges to LLC segments?• Map target cores to nodes?• Pick a number of memory controllers and map them to nodes?• Define a target machine network topology?• Manage interleaving for multiplexing the network and cores?
INTEL CONFIDENTIAL10
Map Address Ranges to LLC Segments (SW)
Build a table of n_llc_map_entries, where each entry is an index to a portion of the distributed LLC.
icn-mesh.cpp:
for (int addr_idx = 0; addr_idx < n_llc_map_entries; addr_idx++){ bool is_last = (addr_idx + 1 == n_llc_map_entries); topology->SendParam(TOPOLOGY_NET_LLC_ADDR_MAP, &cores_net_pos[addr_idx % num_cores], sizeof(TOPOLOGY_VALUE), is_last);}
INTEL CONFIDENTIAL11
Map Address Ranges to LLC Segments (FPGA)
Consume the table that was streamed in from SW
last-level-cache-no-coherence.bsv:
// Define a node that will stream in the topology. This builds a node// on a ring. The node looks for messages tagged TOPOLOGY_NET_MEM_CTRL_MAP// and emits associated payloads.let ctrlAddrMapInit <- mkTopologyParamStream(`TOPOLOGY_NET_MEM_CTRL_MAP);
// Allocate a local memory and initialize it with the streamed-in entries.LUTRAM#(Bit#(TLog#(TMul#(8, MAX_NUM_MEM_CTRLS))), STATION_ID) memCtrlDstForAddr <- mkLUTRAMWithGet(ctrlAddrMapInit);
// Map an address to a node ID using the tablefunction STATION_ID getMemCtrlDstForAddr(LINE_ADDRESS addr); // Use the low bits of the address as the index (resize does this). return memCtrlDstForAddr.sub(resize(addr));endfunction
INTEL CONFIDENTIAL12
Map Address Ranges to LLC Segments (LLC Hub)
rule . . . // Incoming request from core if (m_reqFromCore matches tagged Valid .req) begin // Which instance of the distributed cache is responsible? let dst = getLLCDstForAddr(req.physicalAddress);
if (dst == local_station_id) begin // Local cache handles the address. if (can_enq_reqToLocalLLC &&& ! isValid(m_new_reqToLocalLLC)) begin // Port to LLC is available. Send the local request. did_deq_reqFromCore = True; m_new_reqToLocalLLC = tagged Valid LLC_MEMORY_REQ { src: tagged Invalid, mreq: req }; debugLog.record(cpu_iid, $format("1: Core REQ to local LLC, ") + fshow(req)); end end else if (can_enq_reqToRemoteLLC && ! isValid(m_new_reqToRemoteLLC)) begin // Remote cache instance handles the address and the OCN request port is available. // // These requests share the OCN request port since only one type of request goes to // a given remote station. Memory stations get memory requests above. LLC stations get // core requests here. did_deq_reqFromCore = True; m_new_reqToRemoteLLC = tagged Valid tuple2(dst, req); debugLog.record(cpu_iid, $format("1: Core REQ to LLC %0d, ", dst) + fshow(req)); end end . . . endrule
INTEL CONFIDENTIAL13
Map Cores and Memory Controllers to Nodes
• All computed (currently) in icn-mesh.cpp• Given number of target cores and number of memory controllers:
– Builds a rectangle of cores as close to square as possible– Adds a row of memory controllers at the top and bottom– Topology streamed to FPGA using same mechanism as address mapping
E.g., 15 cores and 3 memory controllers:
x M M xC C C CC C C CC C C CC C C xx M x x
INTEL CONFIDENTIAL14
Network Topology:Map Cores/Memory Controllers to Nodes
• Multiplexed order of nodes is the same as order of cores– No permutations required for local port
• Nodes are connected to:– Core– Memory controller– Nothing
• The node doesn’t care what is connected!• Hide indirection in ports
INTEL CONFIDENTIAL15
Network Topology:Map Cores/Memory Controllers to Nodes
In icn-mesh.bsv:
// // Local ports are a dynamic combination of CPUs, memory controllers, and // NULL connections. // // localPortMap indicates, for each multiplexed port instance ID, the type // of local port attached (CPU, memory controller, NULL). // let localPortInit <- mkTopologyParamStream(`TOPOLOGY_NET_LOCAL_PORT_TYPE_MAP); LUTRAM#(Bit#(TLog#(TAdd#(TAdd#(MAX_NUM_CPUS, 1), NUM_STATIONS))), Bit#(2)) localPortMap <- mkLUTRAMWithGet(localPortInit);
PORT_SEND_MULTIPLEXED#(MAX_NUM_CPUS, OCN_MSG) enqToCores <- mkPortSend_Multiplexed("Core_OCN_Connection_InQ_enq"); PORT_SEND_MULTIPLEXED#(MAX_NUM_MEM_CTRLS, OCN_MSG) enqToMemCtrl <- mkPortSend_Multiplexed("ocn_to_memctrl_enq"); PORT_SEND_MULTIPLEXED#(NUM_STATIONS, OCN_MSG) enqToNull <- mkPortSend_Multiplexed_NULL();
let enqToLocal <- mkPortSend_Multiplexed_Split3(enqToCores, enqToMemCtrl, enqToNull, localPortMap);
INTEL CONFIDENTIAL16
Network Topology: Defining Inter-Node Edges
Each network node:
Local
N
E
S
W
INTEL CONFIDENTIAL17
Network Multiplexing
• Logically, there are n nodes in the network.• Each has a local port connected either to a core, to memory or to
nothing.• Network connection mapping and routing will determine the
topology.• Topology manager defines the routing table.
• Note: Dateline not yet implemented
INTEL CONFIDENTIAL18
Network Topology and Routing
Torus:
INTEL CONFIDENTIAL19
Network Topology and Routing
Mesh (connections identical, routing table ignores some edges):
INTEL CONFIDENTIAL20
Network Topology and Routing
Bi-directional ring:
INTEL CONFIDENTIAL21
Network Topology and Routing
Uni-directional ring:
INTEL CONFIDENTIAL22
Router0..3
Final Problem: Multiplexing On-Chip Network Routers
Router3
Router0
Router2
Router1
cur to 1 to 2 to 3 fr 1 fr 2 fr 30123
0
001
1
1 2 3
2
2 33
reorder
reorder
reorder
σ(x) = (x + 1) mod 4
σ(x) = (x + 2) mod 4
σ(x) = (x + 3) mod 4
1 2 3
0
001
12
2 33
INTEL CONFIDENTIAL23
Network Topology:Communication Across Multiplexed Nodes
• Each node talks to a different multiplexed node instance• Naïve port binding would have each node talk only to itself• A-Ports are already buffered• Bury transformation in A-Ports• Retain simple read next / write next port semantics within models
INTEL CONFIDENTIAL24
Network Topology:Communication Across Multiplexed Nodes
icn-mesh.bsv: // Initialization from topology manager ReadOnly#(STATION_IID) meshWidth <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_WIDTH); ReadOnly#(STATION_IID) meshHeight <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_HEIGHT);
// Outbound and inbound ports are loopbacks to the same multiplexed module. Ports connect // to logically different nodes but physically to the same simulator object. Vector#(NUM_PORTS, PORT_SEND_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqTo = newVector(); Vector#(NUM_PORTS, PORT_RECV_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqFrom = newVector();
// Outbound port is a normal A-Port. It has no buffering. enqTo[portEast] <- mkPortSend_Multiplexed("mesh_interconnect_enq_E");
// Inbound port provides buffering for multiplexing. Instead of forwarding messages FIFO // it must transform the messages so they cross to the correct multiplexed instance when // instances (nodes) are traversed sequentially. enqFrom[portWest] <- mkPortRecv_Multiplexed_ReorderLastToFirstEveryN("mesh_interconnect_enq_E", 1, meshWidth, meshHeight); . . . enqFrom[portEast] <- mkPortRecv_Multiplexed_ReorderFirstToLastEveryN("mesh_interconnect_enq_W", 1, meshWidth, meshHeight); enqFrom[portSouth] <- mkPortRecv_Multiplexed_ReorderFirstNToLastN("mesh_interconnect_enq_N", 1, meshWidth); enqFrom[portNorth] <- mkPortRecv_Multiplexed_ReorderLastNToFirstN("mesh_interconnect_enq_S", 1, meshWidth);