1 Myricom, Inc. 325 N. Santa Anita Ave. Arcadia CA 91006 626-821-5555 Fax: 626-821-5316 http:/ /www.myri.com Scalable Cluster Interconnect Overview and Technology Roadmap Charles L. Seitz [email protected]Linux Superclusters Users Conference Albuquerque, NM 13 September 2000
21
Embed
Scalable Cluster Interconnect...Myricom, Inc. 325 N. Santa Anita Ave. Arcadia CA 91006 1 626-821-5555 Fax: 626-821-5316 http:/ / Scalable Cluster Interconnect Overview and Technology
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1Myricom, Inc. 325 N. Santa Anita Ave. Arcadia CA 91006
Myrinet Technology “in the Large”Sandia National Laboratory Cplant™
2,576 Compaq Alpha Personal Workstations,400 EV-5 + 768 EV-6 + 1408 EV-6, but not allin one cluster.
Compaq CustomSystems was the integrator.The system was built in three phases, in thesummers 1998, 1999, and 2000.
Cplant originally used 16-port Myrinet switchesin each 8-host cabinet. The latest increment usesa mesh variant of the M2LM-Clos64 “Networkin a Box” products for switching.
(Photo adapted from http://www.cs.sandia.gov/cplant/)
5Myricom, Inc. 325 N. Santa Anita Ave. Arcadia CA 91006
Myrinet Technology “in the Small”CSPI Quad-PowerPC VME Signal-Processing Board
This CSPI two-level-multicomputer productuses the Myricom LANai-5 chip to
interface the PowerPCs tothe message-passing
network.
This single-width VMEboard includes a packet-switchedMyrinet network interconnecting the 4 nodes onthe board and 4 external ports with an 8-portMyrinet switch (a chip not visible in this photo).
6Myricom, Inc. 325 N. Santa Anita Ave. Arcadia CA 91006
Myrinet is defined at the Data-Link level (level 2 of the ISO reference model for computer networks) by its packet format and flow control. Think of Myrinet as the simplest packet-switched network you can devise.
Sourcerouteusedby theswitches, which strip the bytes as they are used
Type (allows multiple protocols on one Myrinet)
Payload (any length)CRC
(Bytes)
http://www.myri.com/open-specs/
There are multiple Physical-level implementations.
8Myricom, Inc. 325 N. Santa Anita Ave. Arcadia CA 91006
Clos network of 16 16-port switches,with 64 LAN host ports, and 64 SANinter-switch ports.
Full (maximal) bisection data ratebetween the 64 host ports = 32 links(41+41 Gb/s). Data rate between thehost ports and the inter-switch ports =64 links (82+82 Gb/s).
160 Watts, 12U rack mount size
SNMP/Ethernet monitoring andcontrol, with the full set of Myrinethigh-availability features.
$40K US-list.
10Myricom, Inc. 325 N. Santa Anita Ave. Arcadia CA 91006
Myrinet 2000 – Third-Generation MyrinetThis evolutionary step improves the links at the Physical level -- boththe performance and the “look and feel” of Myrinet --, and introducesinterfaces with 1.7x and 2.5x faster RISCs, but Myrinet-2000 iscompatible with 2nd-generation Myrinet at the Data Link leveland in the software. (Don’t try to innovate along too manydimensions at once! This is a technology push, not an architecturechange.)
This family of products support hot-plugging of line cards, fans, and dual redundant powersupplies. Microcomputer monitoring (SNMP over Ethernet) provides extensive diagnosticcapabilities, and management features needed for high-availability applications.
Different types of line cards have Serial, Fiber, SAN, or legacy LAN ports
Spine of the Clos Network (backplane)
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
8 hosts
Closspreadernetwork
Ports to up to 128 hosts (line cards)
18Myricom, Inc. 325 N. Santa Anita Ave. Arcadia CA 91006
Add the optional monitoring line card to provide SNMP/Ethernet monitoring andcontrol. The monitoring line card includes a microcontroller and dual Ethernetports. All line cards are interchangable across the product family.
19Myricom, Inc. 325 N. Santa Anita Ave. Arcadia CA 91006
Why Clos Networks?; Maximal performance under arbitrary traffic patterns
; Minimum bisection is the largest possible; “Rearrangable Network” (can route any permutation); Network looks the same from any host (simplifies cluster management)
; Multiple paths; All progressive routes are deadlock-free; Use multiple paths for redundancy; Use multiple paths to avoid hot spots (random dispersion)
; Scales well. For n hosts (minimum bisection = n /2):; Diameter varies as log(n); Cost varies as nlog(n); Modular
; Economies of sharing the power supply and microcontroller betweenmany switches, and implementing many of the inter-switch links oncircuit boards rather than cables.
20Myricom, Inc. 325 N. Santa Anita Ave. Arcadia CA 91006