Top Banner
INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler
24

HAsim On-Chip Network Model Configuration

Feb 22, 2016

Download

Documents

dudley

HAsim On-Chip Network Model Configuration. Michael Adler. IMEM. FET. The Front End Multiplexed. Legend: Ready to simulate?. 1. redirect. No. CPU 1. CPU 2. (from Back End). training. 1. Line Pred. (from Back End). Branch Pred. 1. 2. fault. vaddr. pred. 1. mispred. 0. 1. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY

HAsim On-Chip Network ModelConfiguration

Michael Adler

Page 2: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL2

The Front End Multiplexed

FET

BranchPred

IMEM PCResolve

InstQ

I$

ITLB1 1 1 0

1

2

0

0first

deq

slot

enqor

drop

1

fault

mispred

1training

pred

rspImm

rspDel

1

1redirect

1vaddr

(from Back End)

vaddr

0

(from Back End)

paddr

0paddr

1

LinePred

00

instor

fault

Legend: Ready to simulate?

CPU1No CPU

2

FET IMEM

Page 3: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL3

On-Chip Networks in a Time-Multiplexed World

Page 4: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL4

Problem: On-Chip Network

CPUL1/L2 $

msg credit

Memory Control

rr r r

[0 1 2] [0 1 2]

CPU 0L1/L2 $

CPU 1L1/L2 $

CPU 2L1/L2 $

r

router

msg msg

credit credit

• Problem: routing wires to/from each router• Similar to the “global controller” scheme• Also utilization is low

Page 5: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL5

Router0..3

Multiplexing On-Chip Network Routers

Router3

Router0

Router2

Router1

cur to 1 to 2 to 3 fr 1 fr 2 fr 30123

0

001

1

1 2 3

2

2 33

reorder

reorder

reorder

σ(x) = (x + 1) mod 4

σ(x) = (x + 2) mod 4

σ(x) = (x + 3) mod 4

1 2 3

0

001

12

2 33

Simulate the network without a network

Page 6: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL6

On-Chip Network Model Multiplexed Topology

L2 Coherence LLC Hub

OCN_LANE_RECV_Core_1 portDataEnq, 1

OCN_LANE_RECV_Core_2 portDataEnq, 1

CorePvtCache_to_UncoreQ portDataEnq, 1

Uncore_to_CorePvtCacheQ__cred, 1CorePvtCache_to_UncoreQ cred, 1

Uncore_to_CorePvtCacheQ__portDataEnq, 1

Core_OCN_Connection

OCN_LANE_RECV_Core_0 cred, 1

OCN_LANE_RECV_Core_1 cred, 1

OCN_LANE_RECV_Core_2 cred, 1

OCN_LANE_SEND_Core_0 portDataEnq, 1

OCN_LANE_SEND_Core_1 portDataEnq, 1

OCN_LANE_SEND_Core_2 portDataEnq, 1OCN_LANE_RECV_Core_0 portDataEnq, 1

LLC

LLC_to_MEM_req cred,1

MEM_to_LLC_rsp portDataEnq, 1

LLCHub_to_LLC_req__portDataEnq, 1

LLC_to_LLCHub_rsp cred, 1

Mesh Network

mesh_interconnect_credit_E, 1

mesh_interconnect_enq_W, 1mesh_interconnect_enq_S, 1mesh_interconnect_enq_N, 1mesh_interconnect_enq_E, 1

mesh_interconnect_credit_W, 1mesh_interconnect_credit_S, 1mesh_interconnect_credit_N, 1

Core_OCN_Connection_InQ_credit, 1Core_OCN_Connection_InQ_enq, 1

MemoryController

ocn_to_memctrl_credit, 1

ocn_to_memctrl_enq, 1

OCN_LANE_SEND_Core_0 cred, 1

OCN_LANE_SEND_Core_1OCN_LANE_SEND_Core_2

cred, 1cred, 1

Core_OCN_Connection_OutQ_credit, 1

Core_OCN_Connection_OutQ_enq, 1

LLC_to_MEM_req portDataEnq, 1MEM_to_LLC_rsp cred, 1

LLCHub_to_LLC_req__cred, 1LLC_to_LLCHub_rsp portDataEnq, 1

memctrl_to_ocn_credit, 1memctrl_to_ocn_enq, 1

Page 7: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL7

HAsim’s Network Model is Abstract

• In a software model the target network can be built at run-time• Dynamism is expensive in FPGAs and recompilation is slow• Solution: Constrained dynamism

– Fixed parameters: Max nodes, max edges per node, max VCs– Dynamic:

• Number of active contexts (nodes)• Endpoints of each edge (indirection table)• Routing table• Address mapping of distributed LLC

Page 8: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL8

Topology Manager

• Software – runs once at startup so no need to optimize• HASIM_CHIP_TOPOLOGY_CLASS:

– Manages streaming of parameters to the FPGA– Iterates over all software topology mapping classes until convergence

• Namespace defined by dictionaries– .dic files are preprocessed by LEAP tools– Hierarchy of enumerated types

Page 9: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL9

How do I…

• Map address ranges to LLC segments?• Map target cores to nodes?• Pick a number of memory controllers and map them to nodes?• Define a target machine network topology?• Manage interleaving for multiplexing the network and cores?

Page 10: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL10

Map Address Ranges to LLC Segments (SW)

Build a table of n_llc_map_entries, where each entry is an index to a portion of the distributed LLC.

icn-mesh.cpp:

for (int addr_idx = 0; addr_idx < n_llc_map_entries; addr_idx++){ bool is_last = (addr_idx + 1 == n_llc_map_entries); topology->SendParam(TOPOLOGY_NET_LLC_ADDR_MAP, &cores_net_pos[addr_idx % num_cores], sizeof(TOPOLOGY_VALUE), is_last);}

Page 11: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL11

Map Address Ranges to LLC Segments (FPGA)

Consume the table that was streamed in from SW

last-level-cache-no-coherence.bsv:

// Define a node that will stream in the topology. This builds a node// on a ring. The node looks for messages tagged TOPOLOGY_NET_MEM_CTRL_MAP// and emits associated payloads.let ctrlAddrMapInit <- mkTopologyParamStream(`TOPOLOGY_NET_MEM_CTRL_MAP);

// Allocate a local memory and initialize it with the streamed-in entries.LUTRAM#(Bit#(TLog#(TMul#(8, MAX_NUM_MEM_CTRLS))), STATION_ID) memCtrlDstForAddr <- mkLUTRAMWithGet(ctrlAddrMapInit);

// Map an address to a node ID using the tablefunction STATION_ID getMemCtrlDstForAddr(LINE_ADDRESS addr); // Use the low bits of the address as the index (resize does this). return memCtrlDstForAddr.sub(resize(addr));endfunction

Page 12: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL12

Map Address Ranges to LLC Segments (LLC Hub)

rule . . . // Incoming request from core if (m_reqFromCore matches tagged Valid .req) begin // Which instance of the distributed cache is responsible? let dst = getLLCDstForAddr(req.physicalAddress);

if (dst == local_station_id) begin // Local cache handles the address. if (can_enq_reqToLocalLLC &&& ! isValid(m_new_reqToLocalLLC)) begin // Port to LLC is available. Send the local request. did_deq_reqFromCore = True; m_new_reqToLocalLLC = tagged Valid LLC_MEMORY_REQ { src: tagged Invalid, mreq: req }; debugLog.record(cpu_iid, $format("1: Core REQ to local LLC, ") + fshow(req)); end end else if (can_enq_reqToRemoteLLC && ! isValid(m_new_reqToRemoteLLC)) begin // Remote cache instance handles the address and the OCN request port is available. // // These requests share the OCN request port since only one type of request goes to // a given remote station. Memory stations get memory requests above. LLC stations get // core requests here. did_deq_reqFromCore = True; m_new_reqToRemoteLLC = tagged Valid tuple2(dst, req); debugLog.record(cpu_iid, $format("1: Core REQ to LLC %0d, ", dst) + fshow(req)); end end . . . endrule

Page 13: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL13

Map Cores and Memory Controllers to Nodes

• All computed (currently) in icn-mesh.cpp• Given number of target cores and number of memory controllers:

– Builds a rectangle of cores as close to square as possible– Adds a row of memory controllers at the top and bottom– Topology streamed to FPGA using same mechanism as address mapping

E.g., 15 cores and 3 memory controllers:

x M M xC C C CC C C CC C C CC C C xx M x x

Page 14: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL14

Network Topology:Map Cores/Memory Controllers to Nodes

• Multiplexed order of nodes is the same as order of cores– No permutations required for local port

• Nodes are connected to:– Core– Memory controller– Nothing

• The node doesn’t care what is connected!• Hide indirection in ports

Page 15: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL15

Network Topology:Map Cores/Memory Controllers to Nodes

In icn-mesh.bsv:

// // Local ports are a dynamic combination of CPUs, memory controllers, and // NULL connections. // // localPortMap indicates, for each multiplexed port instance ID, the type // of local port attached (CPU, memory controller, NULL). // let localPortInit <- mkTopologyParamStream(`TOPOLOGY_NET_LOCAL_PORT_TYPE_MAP); LUTRAM#(Bit#(TLog#(TAdd#(TAdd#(MAX_NUM_CPUS, 1), NUM_STATIONS))), Bit#(2)) localPortMap <- mkLUTRAMWithGet(localPortInit);

PORT_SEND_MULTIPLEXED#(MAX_NUM_CPUS, OCN_MSG) enqToCores <- mkPortSend_Multiplexed("Core_OCN_Connection_InQ_enq"); PORT_SEND_MULTIPLEXED#(MAX_NUM_MEM_CTRLS, OCN_MSG) enqToMemCtrl <- mkPortSend_Multiplexed("ocn_to_memctrl_enq"); PORT_SEND_MULTIPLEXED#(NUM_STATIONS, OCN_MSG) enqToNull <- mkPortSend_Multiplexed_NULL();

let enqToLocal <- mkPortSend_Multiplexed_Split3(enqToCores, enqToMemCtrl, enqToNull, localPortMap);

Page 16: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL16

Network Topology: Defining Inter-Node Edges

Each network node:

Local

N

E

S

W

Page 17: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL17

Network Multiplexing

• Logically, there are n nodes in the network.• Each has a local port connected either to a core, to memory or to

nothing.• Network connection mapping and routing will determine the

topology.• Topology manager defines the routing table.

• Note: Dateline not yet implemented

Page 18: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL18

Network Topology and Routing

Torus:

Page 19: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL19

Network Topology and Routing

Mesh (connections identical, routing table ignores some edges):

Page 20: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL20

Network Topology and Routing

Bi-directional ring:

Page 21: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL21

Network Topology and Routing

Uni-directional ring:

Page 22: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL22

Router0..3

Final Problem: Multiplexing On-Chip Network Routers

Router3

Router0

Router2

Router1

cur to 1 to 2 to 3 fr 1 fr 2 fr 30123

0

001

1

1 2 3

2

2 33

reorder

reorder

reorder

σ(x) = (x + 1) mod 4

σ(x) = (x + 2) mod 4

σ(x) = (x + 3) mod 4

1 2 3

0

001

12

2 33

Page 23: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL23

Network Topology:Communication Across Multiplexed Nodes

• Each node talks to a different multiplexed node instance• Naïve port binding would have each node talk only to itself• A-Ports are already buffered• Bury transformation in A-Ports• Retain simple read next / write next port semantics within models

Page 24: HAsim On-Chip Network  Model Configuration

INTEL CONFIDENTIAL24

Network Topology:Communication Across Multiplexed Nodes

icn-mesh.bsv: // Initialization from topology manager ReadOnly#(STATION_IID) meshWidth <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_WIDTH); ReadOnly#(STATION_IID) meshHeight <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_HEIGHT);

// Outbound and inbound ports are loopbacks to the same multiplexed module. Ports connect // to logically different nodes but physically to the same simulator object. Vector#(NUM_PORTS, PORT_SEND_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqTo = newVector(); Vector#(NUM_PORTS, PORT_RECV_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqFrom = newVector();

// Outbound port is a normal A-Port. It has no buffering. enqTo[portEast] <- mkPortSend_Multiplexed("mesh_interconnect_enq_E");

// Inbound port provides buffering for multiplexing. Instead of forwarding messages FIFO // it must transform the messages so they cross to the correct multiplexed instance when // instances (nodes) are traversed sequentially. enqFrom[portWest] <- mkPortRecv_Multiplexed_ReorderLastToFirstEveryN("mesh_interconnect_enq_E", 1, meshWidth, meshHeight); . . . enqFrom[portEast] <- mkPortRecv_Multiplexed_ReorderFirstToLastEveryN("mesh_interconnect_enq_W", 1, meshWidth, meshHeight); enqFrom[portSouth] <- mkPortRecv_Multiplexed_ReorderFirstNToLastN("mesh_interconnect_enq_N", 1, meshWidth); enqFrom[portNorth] <- mkPortRecv_Multiplexed_ReorderLastNToFirstN("mesh_interconnect_enq_S", 1, meshWidth);