Top Banner
INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler
24

INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

Dec 16, 2015

Download

Documents

Darion Duggan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY

HAsim On-Chip Network ModelConfiguration

Michael Adler

Page 2: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL2

The Front End Multiplexed

FET

BranchPred

IMEM PCResolve

InstQ

I$

ITLB1 1 1 0

1

2

0

0first

deq

slot

enqor

drop

1

fault

mispred

1training

pred

rspImm

rspDel

1

1redirect

1vaddr

(from Back End)

vaddr

0

(from Back End)

paddr

0paddr

1

LinePred

00

instor

fault

Legend: Ready to simulate?Legend: Ready to simulate?

CPU1No CPU

2

FET IMEM

Page 3: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL3

On-Chip Networks in a Time-Multiplexed World

Page 4: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL4

Problem: On-Chip Network

CPUL1/L2 $

msg credit

Memory Control

rr r r

[0 1 2] [0 1 2]

CPU 0L1/L2 $

CPU 1L1/L2 $

CPU 2L1/L2 $

r

router

msg msg

credit credit

• Problem: routing wires to/from each router• Similar to the “global controller” scheme• Also utilization is low

Page 5: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL5

Router0..3

Multiplexing On-Chip Network Routers

Router3

Router0

Router2

Router1

cur to 1 to 2 to 3 fr 1 fr 2 fr 30123

0

001

1

1 2 3

2

2 33

reorder

reorder

reorder

σ(x) = (x + 1) mod 4

σ(x) = (x + 2) mod 4

σ(x) = (x + 3) mod 4

1 2 3

0

001

12

2 33

Simulate the network without a network

Page 6: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL6

On-Chip Network Model Multiplexed Topology

L2 Coherence LLC Hub

OCN_LANE_RECV_Core_1 portDataEnq, 1

OCN_LANE_RECV_Core_2 portDataEnq, 1

CorePvtCache_to_UncoreQ portDataEnq, 1

Uncore_to_CorePvtCacheQ__cred, 1

CorePvtCache_to_UncoreQ cred, 1

Uncore_to_CorePvtCacheQ__portDataEnq, 1

Core_OCN_Connection

OCN_LANE_RECV_Core_0 cred, 1

OCN_LANE_RECV_Core_1 cred, 1

OCN_LANE_RECV_Core_2 cred, 1

OCN_LANE_SEND_Core_0 portDataEnq, 1

OCN_LANE_SEND_Core_1 portDataEnq, 1

OCN_LANE_SEND_Core_2 portDataEnq, 1OCN_LANE_RECV_Core_0 portDataEnq, 1

LLC

LLC_to_MEM_req cred, 1

MEM_to_LLC_rsp portDataEnq, 1

LLCHub_to_LLC_req__portDataEnq, 1

LLC_to_LLCHub_rsp cred, 1

Mesh Network

mesh_interconnect_credit_E, 1

mesh_interconnect_enq_W, 1

mesh_interconnect_enq_S, 1

mesh_interconnect_enq_N, 1

mesh_interconnect_enq_E, 1

mesh_interconnect_credit_W, 1

mesh_interconnect_credit_S, 1

mesh_interconnect_credit_N, 1

Core_OCN_Connection_InQ_credit, 1

Core_OCN_Connection_InQ_enq, 1MemoryController

ocn_to_memctrl_credit, 1

ocn_to_memctrl_enq, 1

OCN_LANE_SEND_Core_0 cred, 1

OCN_LANE_SEND_Core_1

OCN_LANE_SEND_Core_2

cred, 1

cred, 1

Core_OCN_Connection_OutQ_credit, 1

Core_OCN_Connection_OutQ_enq, 1

LLC_to_MEM_req portDataEnq, 1

MEM_to_LLC_rsp cred, 1

LLCHub_to_LLC_req__cred, 1

LLC_to_LLCHub_rsp portDataEnq, 1

memctrl_to_ocn_credit, 1

memctrl_to_ocn_enq, 1

Page 7: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL7

HAsim’s Network Model is Abstract

• In a software model the target network can be built at run-time• Dynamism is expensive in FPGAs and recompilation is slow• Solution: Constrained dynamism

– Fixed parameters: Max nodes, max edges per node, max VCs– Dynamic:

• Number of active contexts (nodes)• Endpoints of each edge (indirection table)• Routing table• Address mapping of distributed LLC

Page 8: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL8

Topology Manager

• Software – runs once at startup so no need to optimize• HASIM_CHIP_TOPOLOGY_CLASS:

– Manages streaming of parameters to the FPGA– Iterates over all software topology mapping classes until convergence

• Namespace defined by dictionaries– .dic files are preprocessed by LEAP tools– Hierarchy of enumerated types

Page 9: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL9

How do I…

• Map address ranges to LLC segments?• Map target cores to nodes?• Pick a number of memory controllers and map them to nodes?• Define a target machine network topology?• Manage interleaving for multiplexing the network and cores?

Page 10: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL10

Map Address Ranges to LLC Segments (SW)

Build a table of n_llc_map_entries, where each entry is an index to a portion of the distributed LLC.

icn-mesh.cpp:

for (int addr_idx = 0; addr_idx < n_llc_map_entries; addr_idx++){ bool is_last = (addr_idx + 1 == n_llc_map_entries); topology->SendParam(TOPOLOGY_NET_LLC_ADDR_MAP, &cores_net_pos[addr_idx % num_cores], sizeof(TOPOLOGY_VALUE), is_last);}

Page 11: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL11

Map Address Ranges to LLC Segments (FPGA)

Consume the table that was streamed in from SW

last-level-cache-no-coherence.bsv:

// Define a node that will stream in the topology. This builds a node// on a ring. The node looks for messages tagged TOPOLOGY_NET_MEM_CTRL_MAP// and emits associated payloads.let ctrlAddrMapInit <- mkTopologyParamStream(`TOPOLOGY_NET_MEM_CTRL_MAP);

// Allocate a local memory and initialize it with the streamed-in entries.LUTRAM#(Bit#(TLog#(TMul#(8, MAX_NUM_MEM_CTRLS))), STATION_ID) memCtrlDstForAddr <- mkLUTRAMWithGet(ctrlAddrMapInit);

// Map an address to a node ID using the tablefunction STATION_ID getMemCtrlDstForAddr(LINE_ADDRESS addr); // Use the low bits of the address as the index (resize does this). return memCtrlDstForAddr.sub(resize(addr));endfunction

Page 12: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL12

Map Address Ranges to LLC Segments (LLC Hub)

rule . . . // Incoming request from core if (m_reqFromCore matches tagged Valid .req) begin // Which instance of the distributed cache is responsible? let dst = getLLCDstForAddr(req.physicalAddress);

if (dst == local_station_id) begin // Local cache handles the address. if (can_enq_reqToLocalLLC &&& ! isValid(m_new_reqToLocalLLC)) begin // Port to LLC is available. Send the local request. did_deq_reqFromCore = True; m_new_reqToLocalLLC = tagged Valid LLC_MEMORY_REQ { src: tagged Invalid, mreq: req }; debugLog.record(cpu_iid, $format("1: Core REQ to local LLC, ") + fshow(req)); end end else if (can_enq_reqToRemoteLLC && ! isValid(m_new_reqToRemoteLLC)) begin // Remote cache instance handles the address and the OCN request port is available. // // These requests share the OCN request port since only one type of request goes to // a given remote station. Memory stations get memory requests above. LLC stations get // core requests here. did_deq_reqFromCore = True; m_new_reqToRemoteLLC = tagged Valid tuple2(dst, req); debugLog.record(cpu_iid, $format("1: Core REQ to LLC %0d, ", dst) + fshow(req)); end end . . . endrule

Page 13: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL13

Map Cores and Memory Controllers to Nodes

• All computed (currently) in icn-mesh.cpp• Given number of target cores and number of memory controllers:

– Builds a rectangle of cores as close to square as possible– Adds a row of memory controllers at the top and bottom– Topology streamed to FPGA using same mechanism as address mapping

E.g., 15 cores and 3 memory controllers:

x M M xC C C CC C C CC C C CC C C xx M x x

Page 14: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL14

Network Topology:Map Cores/Memory Controllers to Nodes

• Multiplexed order of nodes is the same as order of cores– No permutations required for local port

• Nodes are connected to:– Core– Memory controller– Nothing

• The node doesn’t care what is connected!• Hide indirection in ports

Page 15: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL15

Network Topology:Map Cores/Memory Controllers to Nodes

In icn-mesh.bsv:

// // Local ports are a dynamic combination of CPUs, memory controllers, and // NULL connections. // // localPortMap indicates, for each multiplexed port instance ID, the type // of local port attached (CPU, memory controller, NULL). // let localPortInit <- mkTopologyParamStream(`TOPOLOGY_NET_LOCAL_PORT_TYPE_MAP); LUTRAM#(Bit#(TLog#(TAdd#(TAdd#(MAX_NUM_CPUS, 1), NUM_STATIONS))), Bit#(2)) localPortMap <- mkLUTRAMWithGet(localPortInit);

PORT_SEND_MULTIPLEXED#(MAX_NUM_CPUS, OCN_MSG) enqToCores <- mkPortSend_Multiplexed("Core_OCN_Connection_InQ_enq"); PORT_SEND_MULTIPLEXED#(MAX_NUM_MEM_CTRLS, OCN_MSG) enqToMemCtrl <- mkPortSend_Multiplexed("ocn_to_memctrl_enq"); PORT_SEND_MULTIPLEXED#(NUM_STATIONS, OCN_MSG) enqToNull <- mkPortSend_Multiplexed_NULL();

let enqToLocal <- mkPortSend_Multiplexed_Split3(enqToCores, enqToMemCtrl, enqToNull, localPortMap);

Page 16: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL16

Network Topology: Defining Inter-Node Edges

Each network node:

Local

N

E

S

W

Page 17: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL17

Network Multiplexing

• Logically, there are n nodes in the network.• Each has a local port connected either to a core, to memory or to

nothing.• Network connection mapping and routing will determine the

topology.• Topology manager defines the routing table.

• Note: Dateline not yet implemented

Page 18: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL18

Network Topology and Routing

Torus:

Page 19: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL19

Network Topology and Routing

Mesh (connections identical, routing table ignores some edges):

Page 20: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL20

Network Topology and Routing

Bi-directional ring:

Page 21: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL21

Network Topology and Routing

Uni-directional ring:

Page 22: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL22

Router0..3

Final Problem: Multiplexing On-Chip Network Routers

Router3

Router0

Router2

Router1

cur to 1 to 2 to 3 fr 1 fr 2 fr 30123

0

001

1

1 2 3

2

2 33

reorder

reorder

reorder

σ(x) = (x + 1) mod 4

σ(x) = (x + 2) mod 4

σ(x) = (x + 3) mod 4

1 2 3

0

001

12

2 33

Page 23: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL23

Network Topology:Communication Across Multiplexed Nodes

• Each node talks to a different multiplexed node instance• Naïve port binding would have each node talk only to itself• A-Ports are already buffered• Bury transformation in A-Ports• Retain simple read next / write next port semantics within models

Page 24: INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

INTEL CONFIDENTIAL24

Network Topology:Communication Across Multiplexed Nodes

icn-mesh.bsv:

// Initialization from topology manager ReadOnly#(STATION_IID) meshWidth <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_WIDTH); ReadOnly#(STATION_IID) meshHeight <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_HEIGHT);

// Outbound and inbound ports are loopbacks to the same multiplexed module. Ports connect // to logically different nodes but physically to the same simulator object. Vector#(NUM_PORTS, PORT_SEND_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqTo = newVector(); Vector#(NUM_PORTS, PORT_RECV_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqFrom = newVector();

// Outbound port is a normal A-Port. It has no buffering. enqTo[portEast] <- mkPortSend_Multiplexed("mesh_interconnect_enq_E");

// Inbound port provides buffering for multiplexing. Instead of forwarding messages FIFO // it must transform the messages so they cross to the correct multiplexed instance when // instances (nodes) are traversed sequentially. enqFrom[portWest] <- mkPortRecv_Multiplexed_ReorderLastToFirstEveryN("mesh_interconnect_enq_E", 1, meshWidth, meshHeight); . . . enqFrom[portEast] <- mkPortRecv_Multiplexed_ReorderFirstToLastEveryN("mesh_interconnect_enq_W", 1, meshWidth, meshHeight); enqFrom[portSouth] <- mkPortRecv_Multiplexed_ReorderFirstNToLastN("mesh_interconnect_enq_N", 1, meshWidth); enqFrom[portNorth] <- mkPortRecv_Multiplexed_ReorderLastNToFirstN("mesh_interconnect_enq_S", 1, meshWidth);