Institute of Computer Science oundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laborator Exploiting Spatial Parallelism in Ethernet-based Cluster Interconnects Stavros Passas, George Kotsis, Sven Karlsson, and Angelos Bilas
19
Embed
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Institute of Computer ScienceFoundation for Research and Technology – Hellas
Greece
Computer Architecture and VLSI Systems Laboratory
Exploiting Spatial Parallelism in Ethernet-based Cluster
Interconnects
Stavros Passas, George Kotsis, Sven Karlsson, and Angelos
Bilas
FORTH-ICS CARV/scalable 2
Motivation Typically, clusters today use multiple interconnects
Interprocess communication (IPC): myrinet, infiniband, etc
IO: fibre channel, scsi Fast LAN: 10 GigE
However, this increases system and management cost
Can we use a single interconnect for all types of traffic? Which one?
High network speeds 10-40 GBit/s
FORTH-ICS CARV/scalable 3
Trends and Constraints Most interconnects use similar physical layer, but
differ in Protocol semantics and guarantees they provide Protocol implementation on the NIC and network core
Higher layer protocols (e.g. TCP/IP, NFS) are independent of the interconnect technology
10+ Gbps Ethernet is particularly attractive, but … Typically associated with higher overheads Requires more support at the edge due to simpler net
core
FORTH-ICS CARV/scalable 4
This Work How well can a protocol do over 10-40 GigE?
Scale throughput efficiently over multiple links
Analyze protocol overhead at the host CPU
Propose and evaluate optimizations for reducing host CPU overhead Implemented without H/W support
FORTH-ICS CARV/scalable 5
Outline
Motivation Protocol design over Ethernet Experimental results Conclusions and future work
Standard Protocol Processing
Sources of overhead System call to issue operation Memory copies at sender and receiver Protocol packet processing Interrupt notification for freeing send-side buffer, packet arrival Extensive device accesses Context switch from interrupt to receive thread for packet
processing
FORTH-ICS CARV/scalable 6
FORTH-ICS CARV/scalable 7
Our Base Protocol
Improves on MultiEdge [IPDPS’07] Support for multiple links with different
schedulers H/W coalescing for send- & receive-side
interrupts S/W coalescing in interrupt handler
Still requires System calls One copy at send and one at receive side Context switch in receive path
FORTH-ICS CARV/scalable 8
Evaluation Methodology Research questions
How does the protocol scale with the number of links? What are the important overheads at 10 Gbits/s? What is the impact of link scheduling?
We use two nodes connected back-to-back Dual-CPU (Opteron 244) 1-8 links of 1 Gbit/s (Intel) 1 link of 10 Gbit/s (Myricom)
We focus on Throughput: end-to-end, reported by benchmarks Detailed CPU breakdowns: extensive kernel