Top Banner
Fabric Interfaces Architecture Sean Hefty - Intel Corporation
41
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Fabric Interfaces Architecture

Sean Hefty - Intel Corporation

Page 2: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Changes

• v2– Remove interface object– Add open interface as base object– Add SRQ object– Add EQ group object

• v3– Modified SRQ– Enhanced architecture semantics

www.openfabrics.org 2

Page 3: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Overview

• Object Model– Do we have the right type of objects defines?– Do we have the correct object relationships?

• Interface Synopsis– High-level description of object operations– Is functionality missing?– Are interfaces associated with the right object?

• Architectural Semantics– Do the semantics match well with the apps?– What semantics are missing?

www.openfabrics.org 3

Page 4: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Object “Class” Model

• Objects represent collection of attributes and interfaces– I.e. object-oriented programming model

• Consider architectural model only at this point

www.openfabrics.org 4

Objects do not necessarily map directly to hardware or software objects

Page 5: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Conceptual Object Hierarchy

www.openfabrics.org 5

Fabr

ic

Des

crip

tor

FabricDomain

Address VectorMapIndex

Endpoint

MsgPassiveActive

DatagramRDM

Dispatcher

Event Queue

CompletionCMAV

DomainEQ GroupCounter

Memory RegionInterfaces

Object “inheritance”

Page 6: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Object Relationships

www.openfabrics.org 6

Fabr

ic

Passive EP

EQ CM

Domain

AVMap

Index

Active EP

Msg

Datagram

RDM

Dispatch EPEQ Group

EQ

CQ

CM

AV

DomainCounter

MR

Object “scope”

Page 7: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Fabric

• Represents a communication domain or boundary– Single IB or RoCE subnet, IP

(iWarp) network, Ethernet subnet

• Multiple local NICs / ports• Topology data, network time

stamps• Determines native addressing

– Mapped addressing possible– GID/LID versus IP

www.openfabrics.org 7

Fabr

ic

Passive EP

EQ

Domain

Page 8: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Passive (Fabric) EP

• Listening endpoint– Connection-oriented protocols

• Wildcard listen across multiple NICs / ports

• Bind to address to restrict listen– Listen may migrate with

address

www.openfabrics.org 8

Fabr

ic

Passive EP

EQ

Domain

Page 9: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Fabric EQ

• Associated with passive endpoint(s)

• Reports connection requests• Could be used to report fabric

events

www.openfabrics.org 9

Fabr

ic

Passive EP

EQ

Domain

Page 10: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Resource Domain

• Boundary for resource sharing– Physical or logical NIC– Command queue

• Container for data transfer resources

• A provider may define multiple domains for a single NIC– Dependent on resource sharing

www.openfabrics.org 10

Fabr

ic

Passive EP

EQ

Domain

Page 11: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Domain Address Vectors

• Maintains list of remote endpoint addresses– Map – native addressing– Index – ‘rank’-based addressing

• Resolves higher-level addresses into fabric addresses– Native addressing abstracted

from user

• Handles address and route changes

www.openfabrics.org 11

Dom

ain

AV

Active EP

EQ

EQ Group

Counter

MR

Page 12: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Domain Endpoints

• Data transfer portal– Send / receive queues– Command queues– Ring buffers– Buffer dispatching

• Multiple types defined– Connection-oriented /

connectionless– Reliable / unreliable– Message / stream

www.openfabrics.org 12

Dom

ain

AV

Active EP

EQ

EQ Group

Counter

MR

Page 13: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Domain Event Queues

• Reports asynchronous events• Unexpected errors reported

‘out of band’• Events separated into ‘EQ

domains’– CM, AV, completions– 1 EQ domain per EQ– Future support for merged EQ

domains

www.openfabrics.org 13

Dom

ain

AV

Active EP

EQ

EQ Group

Counter

MR

Page 14: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

EQ Groups

• Collection of EQs• Conceptually shares same

wait object• Grouping for progress and

wait operations

www.openfabrics.org 14

Dom

ain

AV

Active EP

EQ

EQ Group

Counter

MR

Page 15: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Domain Counters

• Provides a count of successful completions of asynchronous operations– Conceptual HW counter

• Count is independent from an actual event reported to the user through an EQ

www.openfabrics.org 15

Dom

ain

AV

Active EP

EQ

EQ Group

Counter

MR

Page 16: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Domain Memory Regions

• Memory ranges accessible by fabric resources– Local and/or remote access

• Defines permissions for remote access

www.openfabrics.org 16

Dom

ain

AV

Active EP

EQ

EQ Group

Counter

MR

Page 17: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Interface Synopsis

• Operations associated with identified ‘classes’• General functionality, versus detailed methods

– The full set of methods are not defined here– Detailed behavior (e.g. blocking) is not defined

• Identify missing and unneeded functionality– Mapping of functionality to objects

www.openfabrics.org 17

Use timeboxing to limit scope of interfaces to refine by a target date

Page 18: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

www.openfabrics.org 18

Base ClassClose Destroy / free objectBind Create an association between two object

instancesSync Fencing operation that completes only after

previously issued asynchronous operations have completed

Control (~fcntl) set/get low-level object behaviorI/F Open Open provider extended interfaces

Page 19: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

www.openfabrics.org 19

FabricDomain Open a resource domainEndpoint Create a listening EP for connection-oriented

protocolsEQ Open Open an event queue for listening EP or

reporting fabric events

Page 20: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

www.openfabrics.org 20

Resource DomainQuery Obtain domain specific attributesOpen AV, EQ, EP, SRQ, EQ Group

Create an address vector, event or completion counter, event queue, endpoint, shared receive queue, or EQ group

MR Ops Register data buffers for access by fabric resources

Page 21: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

www.openfabrics.org 21

Address VectorInsert Insert one or more addresses into the vectorRemove Remote one or more addresses from the

vectorLookup Return a stored addressStraddr Convert an address into a printable string

Page 22: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

www.openfabrics.org 22

Base EPEnable Enables an active EP for data transfersCancel Cancel a pending asynchronous operationGetopt (~getsockopt) get protocol specific EP optionsSetopt (~setsockopt) set protocol specific EP options

Page 23: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

www.openfabrics.org 23

Passive EPGetname (~getsockname) return EP addressListen Start listening for connection requestsReject Reject a connection request

Page 24: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

www.openfabrics.org 24

Active EPCM Connection establishment ops, usable by

connection-oriented and connectionless endpoints

MSG 2-sided message queue ops, to send and receive messages

RMA 1-sided RDMA read and write opsTagged 2-sided matched message ops, to send and

receive messages (conceptual merge of messages and RMA writes)

Atomic 1-sided atomic opsTriggered Deferred operations initiated on a condition

being met

Page 25: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

www.openfabrics.org 25

Event QueueRead Retrieve a completion event, and optional

source endpoint address data for received data transfers

Read Err Retrieve event data about an operation that completed with an unexpected error

Write Insert an event into the queueReset Directs the EQ to signal its wait object when a

specified condition is metStrerror Converts error data associated with a

completion into a printable string

Page 26: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

www.openfabrics.org 26

EQ GroupPoll Check EQs for eventsWait Wait for an event on the EQ group

Page 27: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

www.openfabrics.org 27

Completion CounterRead Retrieve a counter’s valueAdd Increment a counterSet Set / clear a counter’s valueWait Wait until a counter reaches a desired

threshold

Page 28: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

www.openfabrics.org 28

Memory RegionDesc (~lkey) Optional local memory descriptor

associated with a data bufferKey (~rkey) Protection key against access from

remote data transfers

Page 29: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Architectural Semantics

• Progress• Ordering - completions and data delivery• Multi-threading and locking model• Buffering• Function signatures and semantics

www.openfabrics.org 29

Once defined, object and interface semantics cannot change – semantic changes require new

objects and interfaces

Need refining

Page 30: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Progress

• Ability of the underlying implementation to complete processing of an asynchronous request

• Need to consider ALL asynchronous requests– Connections, address resolution, data transfers, event

processing, completions, etc.

• HW/SW mix

www.openfabrics.org 30

All(?) current solutions require significant software components

Page 31: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Progress

• Support two progress models– Automatic and implicit

• Separate operations as belonging to one of two progress domains– Data or control– Report progress model for each domain

www.openfabrics.org 31

SAMPLE Implicit Automatic

Data Software Hardware offload

Control Software Kernel services

Page 32: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Automatic Progress

• Implies hardware offload model– Or standard kernel services / threads for control

operations

• Once an operation is initiated, it will complete without further user intervention or calls into the API

• Automatic progress meets implicit model by definition

www.openfabrics.org 32

Page 33: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Implicit Progress

• Implies significant software component• Occurs when reading or waiting on EQ(s)• Application can use separate EQs for control

and data• Progress limited to objects associated with

selected EQ(s)• App can request automatic progress

– E.g. app wants to wait on native wait object– Implies provider allocated threading

www.openfabrics.org 33

Page 34: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Ordering

• Applies to a single initiator endpoint performing data transfers to one target endpoint over the same data flow– Data flow may be a conceptual QoS level or path

through the network

• Separate ordering domains– Completions, message, data

• Fenced ordering may be obtained using fi_sync operation

www.openfabrics.org 34

Page 35: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Completion Ordering

• Order in which operation completions are reported relative to their submission

• Unordered or ordered– No defined requirement for ordered completions

• Default: unordered

www.openfabrics.org 35

Page 36: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Message Ordering

• Order in which message (transport) headers are processed– I.e. whether transport message are received in or out of

order

• Determined by selection of ordering bits– [Read | Write | Send] After [Read | Write | Send]– RAR, RAW, RAS, WAR, WAW, WAS, SAR, SAW, SAS

• Example:– fi_order = 0 // unordered– fi_order = RAR | RAW | RAS | WAW | WAS |

SAW | SAS // IB/iWarp orderingwww.openfabrics.org 36

Page 37: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Data Ordering

• Delivery order of transport data into target memory– Ordering per byte-addressable location– I.e. access to the same byte in memory

• Ordering constrained by message ordering rules– Must at least have message ordering first

www.openfabrics.org 37

Page 38: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Data Ordering

• Ordering limited to message order size– E.g. MTU– In order data delivery if transfer <= message order size

• Message order size = 0– No data ordering

• Message order size = -1– All data ordered

www.openfabrics.org 38

Page 39: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Other Ordering Rules

• Ordering to different target endpoints not defined• Per message ordering semantics implemented

using different data flows– Data flows may be less flexible, but easier to

optimize for– Endpoint aliases may be configured to use different

data flows

www.openfabrics.org 39

Page 40: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Multi-threading and Locking

• Support both thread safe and lockless models– Compile time and run time support– Run-time limited to compiled support

• Lockless (based on MPI model)– Single – single-threaded app– Funneled – only 1 thread calls into interfaces– Serialized – only 1 thread at a time calls into interfaces

• Thread safe– Multiple – multi-threaded app, with no restrictions

www.openfabrics.org 40

Page 41: Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Buffering

• Support both application and network buffering– Zero-copy for high-performance– Network buffering for ease of use

• Buffering in local memory or NIC

– In some case, buffered transfers may be higher-performing (e.g. “inline”)

• Registration option for local NIC access– Migration to fabric managed registration

• Required registration for remote access– Specify permissions

www.openfabrics.org 41