Top Banner
Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems
58

Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Dec 27, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Extensible Message Layers forMultimedia Cluster Computers

Dr. Craig Ulmer

Center for Experimental Research in Computer Systems

Page 2: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Outline

Background­ Evolution of cluster computers­ Multimedia of “Resource-rich” cluster computers

Design of extensible message layers­ GRIM: General-purpose Reliable In-order Messages

Extensions­ Integrating peripheral devices­ Streaming computations

Host-to-host performance

Concluding remarks

Page 3: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Background

An Evolution of Cluster Computers

Page 4: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Cluster Computers

Cost-effective alternative to supercomputers­ Number of commodity workstations­ Specialized network hardware and software

Result: Large pool of host processors

CPU

NetworkInterface

Memory

I/O

Bus

CPU

NetworkInterface

Memory

I/O

Bus

CPU

NetworkInterface

Memory

I/O

Bus

CPU

NetworkInterface

MemoryI/

O B

us

System Area Network

Page 5: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Improving Cluster Computers

Adding more host CPUs Adding intelligent peripheral devices

PeripheralDevices

Host CPUs

Page 6: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Peripheral Device Trends

Increasingly independent, intelligent peripheral devices

Feature on-card processing and memory facilities

Migration of computing power and bandwidth requirements to peripherals

Ethernet

Host

Storage

CPU

SAN NI

Media Capture

Page 7: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Resource-Rich Cluster Computers

Inclusion of diverse peripheral devices­ Ethernet server cards, multimedia capture devices,

embedded storage, computational accelerators

Processing takes place in host CPUs and peripherals

SAN NI

Ethernet

HostHost

Host

System AreaNetwork

Cluster

SAN NIVideo Capture

FPGA

Host

Host Host

Storage

HostHost

CPU CPU

Page 8: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Benefits of Resource-Rich Clusters

Employ cluster computing in new applications­ Real-time constraints­ I/O intensive­ Network

Example: Digital libraries­ Enormous amounts of data­ Large number of network users

Example: Multimedia­ Capture and process large streams of multimedia data­ CAVE or Visualization clusters

Page 9: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Extensible Message Layers

Supporting Resource-Rich Cluster Computers

Page 10: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Problem: Utilizing distributed cluster resources

How is efficient intra-cluster communication provided? How can applications make use of resources?

CPU

CPUCPU CPU CPU CPU CPU

CPU

CPU

VideoCapture

FPGA

RAID

FPGA

FPGA

EthernetEthernet

RAID

RAID

? ? ? ? ? ?

Page 11: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Answer: Flexible “Message Layer” Communication Software

Message layers are enabling technology for clusters­ Enable cluster to function as single image multiprocessor system

Current message layers­ Optimized for transmissions between host CPUs­ Peripheral devices only available in context of the local host

What is needed­ Support efficient communication with host CPUs and peripherals­ Ability to harness peripheral devices as pool of resources

Page 12: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

GRIM: An Implementation

A message layer for

resource-rich clusters

Page 13: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

GRIM

Core

General-purpose Reliable In-order Message Layer (GRIM)

Message layer for resource-rich clusters­ Myrinet SAN backbone­ Both host CPUs and peripheral devices are endpoints­ Communication core implemented in NI

CPU

FPGA Card

Storage Card

NetworkInterface

Card

SystemArea

Network

Page 14: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Per-hop Flow Control

End-to-end flow control necessary for reliable delivery­ Prevents buffer overflows in communication path

Endpoint-managed schemes­ Impractical for peripheral devices

Per-hop flow control scheme­ Transfer data as soon as next stage can accept­ Optimistic approach

ReceivingEndpoint

SendingEndpoint SAN

Network Interface Network Interface

PCI PCI

Reply

ReceivingEndpoint

SendingEndpoint

Send

SANNetwork Interface Network Interface

PCI PCIReceivingEndpoint

SendingEndpoint

DATA

ACK

DATA

ACK

PCISAN

Network Interface Network Interface

DATA

ACK

PCI

Page 15: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Logical Channels

Multiple endpoints in a host share the NI Employ multiple logical channels in the NI

­ Each endpoint owns one or more logical channels­ Logical channel provides virtual interface to network

Endpoint 1

Endpoint n

Logical Channel

Logical Channel

Network Interface

Scheduler

Network

Page 16: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Programming Interfaces: Active Messages

Message specifies function to be executed at receiver­ Similar to remote procedure calls, but lightweight­ Invoke operations at remote resources

Useful for constructing device-specific APIs Example: Interactions with remote storage controller

CPU

StorageControllerNINI SAN

AM_fetch_file()

AM_return_file()

Page 17: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Programming Interfaces: Remote Memory

Transfer blocks of data from one host to another­ Receiving NI executes transfer directly

Read and Write operations­ NI interacts with kernel driver to translate virtual addresses­ Optional notification mechanisms

CPU

NINI SAN

MemoryCPU

Memory

Page 18: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Integrating Peripheral Devices

Hardware Extensibility

Page 19: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Peripheral Device Overview

NI

CPU

CPU

Peripheral Device

In GRIM peripherals are endpoints

Intelligent peripherals­ Operate autonomously­ On-card message queues­ Process incoming active messages­ Eject outgoing active messages

Legacy peripherals­ Managed by host application or­ Remote memory operations

Legacy Peripheral Device

Page 20: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Peripheral Devices Examples

Video display card­ Manipulate frame buffer­ Remote memory writes

Video Display

D/AAGPFrameBuffer

Server adaptor card­ Networked host on PCI card­ AM handlers for LAN-SAN bridge

Server Adaptor

Ethernet

PCI i960

SCSI

PCIDMA

A/D FrameBuffer

HostMemoryVideo Capture

Video capture card­ Specialized DMA engine­ AM handlers capture data

Page 21: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Celoxica RC-1000 FPGA Card

FPGAs provide acceleration­ Load with application-specific circuits

Celoxica RC-1000 FPGA card­ Xilinx Virtex-1000 FPGA­ 8 MB SRAM

Hardware implementation­ Endpoint as state machines­ AM handlers are circuits

SRAM

0SRAM

1SRAM

2SRAM

3

PCIFPGA

Control­&­Switching

Page 22: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

FPGA Endpoint Organization

Frame

Input­Queues

Output­Queues

Communication Library API

ApplicationData

Memory API

FPGA Card Memory

FPGACircuit Canvas

User Circuit API

User­Circuit­n

User­Circuit­1

Page 23: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Example FPGA Configuration

Cryptography configuration­ DES, RC6, MD5, and ALU

Occupies 70% of FPGA­ Newer FPGAs 8x in size

Operates with 20 MHz clock­ Newer FPGAs 6x faster­ 4KB Payload => 55 s (73MB/s)

Page 24: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Expansion: Sharing the FPGA

FPGA has limited space for hardware circuits­ Host reconfigures FPGA on demand­ FPGA Function Fault

HostCPU

FPGA

Circuit X

Circuit Y

Configuration A

Circuit X

Circuit Y

Configuration A

Configuration B

Circuit E

Circuit F

Configuration C

Circuit G

StateStorage

SRAM­0Message:Use Circuit F

FunctionFault

Circuit E

Circuit F

Configuration C

Circuit G

(150 ms)

Page 25: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Extension: Streaming Computations

Software extensibility

Page 26: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Streaming Computation Overview

Programming method for distributed resources­ Establish pipeline for streaming operations­ Example: Multimedia processing

Celoxica RC-1000 FPGA endpoint

CPU

NI

VideoCapture

CPU

NI

MediaProcessor

CPU

NI

MediaProcessor

CPU

NI

MediaProcessor

System Area Network

Page 27: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Streaming Fundamentals

Computation: How is a computation performed?­ Active message approach

Forwarding: Where are results transmitted?­ Programmable forwarding directory

Destination: FPGAForward Entry: XAM: Perform FFT

In MessageFPGA

Computational Circuits

Circuit 1: FFT

Circuit N: Encrypt

Forwarding DirectoryDestination: Host Forward Entry: XAM: Receive FFT

Out Message

Page 28: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Host-to-Host Performance

Transferring data betweentwo host-level endpoints

Page 29: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Host-to-Host Communication Performance

Host-to-Host transfers standard benchmark Three phases of data transfer

­ Injection most challenging

Overall communication path

NI SAN

CPU

NI

CPU

Memory Memory

Active Messages

Remote Memory Operations

11

22

33

Source Destination

Page 30: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Host-NI: Data Injections

Host-NI transfers challenging­ Host lacks DMA engine

Multiple transfer methods­ Programmed I/O­ DMA

Automatically select methodResult: Tunable PCI Injection Library (TPIL)

CPU

MainMemory

PC

I B

us

PCIDMA

Peripheral

DeviceMemory

MemoryController

Cache

Page 31: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

TPIL Performance: LANai 9 NI with Pentium III-550 MHz Host

Ban

dwid

th (

MB

ytes

/s)

Injection Size (Bytes)

Page 32: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Overall Communication Pipeline

Three phases of transmission­ Optimization: Use fragmentation to increase utilization­ Optimization: Allow cut-through transmissions

time

SendingHost-NI

NI-NI

ReceivingNI-Host

Message 1

Message 1

Message 1 Message 2

Message 2

Message 2

Overall Transmission Time

Message 1

Message 1

Message 1

Message 3Message 2

Message 3Message 2

Message 3Message 2

Overall Transmission TimeOverall Transmission Time

Page 33: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Overall Host-to-Host Performance

Host NI Latency (μs) Bandwidth (MB/s)

P4-1.7GHzLANai 9 8 146

LANai 4 14.5 108

P3-550MHzLANai 9 9.5 116

LANai 4 14 96

Ban

dwid

th (

MB

ytes

/s)

Message Size (Bytes)

Page 34: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Comparison to Existing Message Layers

Latency (μs)

μs

Bandwidth (MB/s)

MB/s

Page 35: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Concluding Remarks

Page 36: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Key Contributions

Framework for communication in resource-rich clusters­ Reliable delivery mechanisms, virtualized network interface, and

flexible programming interfaces­ Comparable performance to state-of-the-art message layers

Extensible for peripheral devices­ Suitable for intelligent and legacy peripherals­ Methods for managing card resources

Extensible for higher-level programming abstractions­ Endpoint-level: Streaming computations and sockets emulation­ NI-level: Multicast support

Page 37: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Future Directions

Continued work with GRIM­ Video card vendors opening cards to developers­ Myrinet connected embedded devices

Adaptation to other network substrates­ Gigabit Ethernet appealing because of cost­ Modification to transmission protocols­ InfiniBand technology promising

Active system area networks­ FPGA chips beginning to feature gigabit transceivers­ Use FPGA chips as networked processing device

Page 38: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Additional Research Projects

Page 39: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Wireless Sensor Networks

NASA JPL Research­ In-situ WSNs ­ Exploration of Mars

Communication­ Self organization­ Routing

SensorSim­ Java simulator­ Evaluate protocols

Page 40: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

PeZ: Pole-Zero Editor for MATLAB

Page 41: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Related Publications

A Tunable Communications Library for Data Injection, C. Ulmer and S. Yalamanchili, Proceedings of Parallel and Distributed Processing Techniques and Applications, 2002.

Active SANs: Hardware Support for Integrating Computation and Communication, C. Ulmer, C. Wood, and S. Yalamanchili, Proceedings of the Workshop on Novel Uses of System Area Networks at HPCA, 2002.

A Messaging Layer for Heterogeneous Endpoints in Resource Rich Clusters, C. Ulmer and S. Yalamanchili, Proceedings of the First Myrinet User Group Conference, 2000.

An Extensible Message Layer for High-Performance Clusters, C. Ulmer and S. Yalamanchili, Proceedings of Parallel and Distributed Processing Techniques and Applications, 2000.

Papers and Software Available at

http://www.CraigUlmer.com/research

Page 42: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Backup Slides

Page 43: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Performance: FPGA Computations

Acquire SRAM

Detect New Message

Fetch Header

Computation

Store Results

Store Header

Lookup Forwarding

Update Queues

Release SRAM

8

4

7

1024

1024

16

5

3

1

Fetch Payload 1024

Clocks

Clock Speed: 20MHzOperation Latency: 55 s (4KB 73MB/s)

Page 44: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

SRAM­0(Incoming Queues)

SRAM­1(User Page 0)

SRAM­3(Outgoing Queues)

Port­A Port­C

Built-in ALU Ops

SRAM­2(User Page 1)

MessageGenerator

ResultsCache

Port­B

ScratchpadController

ScratchpadController

Fetch/Decode

Control/ StatusPort

Page 45: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Page Fault

Expansion: Sharing On-Card Memory

Limited on-card memory for storing application data­ Construct virtual memory system for on-card memory­ Swap space is host memory

HostCPU

FPGA

User-definedCircuits

PageFrame 1

SRAM­1

PageFrame 2

SRAM­2

PageFrame 1

PageFrame 1

PageFrame 1

UserPage X

Page 46: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

RC-1000 Challenges

Hardware implementation­ Queue state machines

Memory locking­ SRAM single ported­ Arbitrate for use

CPU / NI contention­ NI manages FPGA lock

FPGA

UserCircuits

SRAM

CPU

Memory­Lock

NI

Page 47: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Example: Autonomous Spaceborne Clusters

NASA Remote Exploration and Experimentation­ Spaceborne vehicle processes data locally­ Clusters in the sky

Number of peripheral devices­ Data sensors­ FPGA & DSPs

Adaptive hardware­ Modify functionality after deployment

Page 48: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Acquire FPGA SRAM­ CPU-NI: 20 s ­ NI: 8 s

Inject 4 KB message to FPGA­ CPU: 58 s (70 MB/s)­ NI: 32 s (128 MB/s)

Release FPGA SRAM­ CPU-NI: 8 s ­ NI: 5 s

Performance: Card Interactions

FPGA

UserCircuits

SRAMMemory­Lock

NI

CPU

Page 49: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Example: Digital Libraries

Enormous amount of data and users­ Intelligent LAN and storage cards to manage requests

CPU

Intelligent LANAdaptor

StorageAdaptor

SANNI Files A-H

CPU

Intelligent LANAdaptor

StorageAdaptor

SANNI Files S-Z

Client Client Client ClientClient Client

CPU

Intelligent LANAdaptor

StorageAdaptor

SANNI Files I-R

SAN Backbone

Page 50: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Cyclone Systems I2O Server Adaptor Card

Networked host on a PCI card Integration with GRIM

­ Interact directly with the NI­ Ported host-level endpoint software

Utilized as a LAN-SAN bridge

HostSystem

i960 RxProcessor

DMAEngines

PrimaryPCI

Interface

DRAM

10/100 Ethernet

10/100 Ethernet

SCSI

SCSI

ROM

DMAEngine

SecondaryPCI

Interface

Daughter Card

Local Bus

Page 51: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

GRIM Multicast Extensions

Distribute the same message to multiple receivers­ Tree based distributions­ Replicate message at NI­ Messages are recycled back into network

Extensions to NI’s core communication operations­ Recycled messages in separate logical channel­ Utilize per-hop flow control for reliable delivery

A

B C

D E

NIEndpoint A

NI Endpoint B

NI Endpoint D

NI Endpoint C

NI Endpoint E

A

B

C

D

E

Page 52: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Multicast Performance

1

10

100

1,000

10,000

100,000

1 10 100 1,000 10,000 100,000 1,000,000

Multicast RTT

Unicast RTT

Multicast Injection Overhead

Unicast Injection Overhead

LANai 4, P4-1.7 GHz Hosts

Tim

e (μ

s)

8 Hosts

Multicast Message Size (Bytes)

Page 53: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Multicast Observations

Beneficial: reduces sending overhead

Performance loss for large messages­ Dependent on NI memory copy bandwidth

On-card memory copy benchmark: ­ LANai 4: 19 MB/s­ LANai 9: 66 MB/s

Page 54: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Extension: Sockets Emulation

Berkeley sockets is a communication standard­ Utilized in numerous distributed applications

GRIM provides sockets API emulation­ Functions for intercepting socket calls­ AM handler functions for buffering connection data

write()

Intercept

Generate AM

AM:AppendSocket X

SocketData

Socket X

AM HandlerAppend Socket

Intercept

Extract Data

read()

Sender Receiver

Page 55: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Sockets Emulation Performance

0

20

40

60

80

100

120

1 10 100 1,000 10,000 100,000 1,000,000 10,000,000

GRIM Sockets LANai 4

100 Mb/s Ethernet

P4-1.7 GHz Hosts

Ban

dwid

th (

MB

ytes

/s)

Transfer Size (Bytes)

Page 56: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Overall Performance: Store-and-Forward

Approach: Single message, no overlap­ Three transmission stages­ Expect roughly 1/3 of bandwidth of individual stage

P3-550 MHz Hosts

Message 1

Message 1

Message 1

time

PCI: 132 MB/s

PCI: 132 MB/s

Myrinet: 160 MB/s

Overall Transmission Time

SendingHost-NI

NI-NI

ReceivingNI-Host

Ban

dwid

th (

MB

ytes

/s)

Message Size (Bytes)

Page 57: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Enhancement: Message Pipelining

Allow overlap with multiple in-flight messages­ GRIM uses AM and RM fragmentation/reassembly­ Performance depends on fragment size

LANai 9, P3-550 MHz Hosts

SendingHost-NI

NI-NI

ReceivingNI-Host

Message 1

time

Message 3Message 2

Overall Transmission Time

Message 1 Message 3Message 2

Message 1 Message 3Message 2

Ban

dwid

th (

MB

ytes

/s)

Message Size (Bytes)

Page 58: Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.

Enhancement: Cut-through Transfers

Forward data as soon as it begins to arrive­ Cut-through at sending and receiving NIs

time

Message 1

Message 1

Message 1 Message 2

Message 2

Message 2

SendingHost-NI

NI-NI

ReceivingNI-Host

Overall Transmission TimeLANai 9, P3-550 MHz HostsMessage Size (Bytes)

Ban

dwid

th (

MB

ytes

/s)