Page 1
Multiterabit Switch FabricsEnabled by ProximityCommunication
Hans Eberle, Alex Chow, Bill Coates, JackCunningham, Robert Drost, Jo Ebergen,Scott Fairbanks, Jon Gainsley, Nils Gura,Ron Ho, David Hopkins, AshokKrishnamoorthy, Jon Lexau, WladekOlesinski, Tarik Ono, Justin Schauer
Sun Microsystems Laboratories
2Hot Chips 19 © Sun Microsystems, Inc.
Future Interconnect Needs
� The interconnect becomes an increasinglycritical system component> Fatter compute nodes
> Increasing disparity between local and remotecommunication
� Data center trends> Server consolidation
> Network consolidation
> Virtualization
> Clustering
> Horizontal scale beyond the chassis
Page 2
3Hot Chips 19 © Sun Microsystems, Inc.
Proximity Communication (PxC)
300 m
Transmit
Transmit
Receive
Receive
Tx Micropads Rx Pads
XVernier
YV
ern
ier
0-20 m
5-20mm
Chips overlap face-to-face
Capacitively couple over micron distances Utilize on-chip electronic alignment
Chip 1Chip 2
Chip 3
4Hot Chips 19 © Sun Microsystems, Inc.
Removing the Chip IO Bottleneck
I/O
lanes p
er
mm
2
Area Ball Bonding
Huge Bandwidth Gain Comparison of Scale
2003 2005 2007 2009
10
100
1000Proximity I/O
Area Ball Bonding
Year120_m
15_m
Proximity Communication10 Tbps per mm2
Page 3
5Hot Chips 19 © Sun Microsystems, Inc.
Proximity Communication Advantages
� Increases bandwidth/area
� Avoids off-chip wires
� Obviates ESD protection
� Shrinks transceiver circuits
� Lowers power consumption
� Makes multi-chip modules reworkable
� Enables smaller chips
6Hot Chips 19 © Sun Microsystems, Inc.
Opportunity
� Proximity Communication allows forbuilding switch fabrics that scale tothousands of ports and multiple Tbpsthroughput using a flat single-stagenetwork rather than a hierarchical multi-stage network
Switch bisection bandwidth
Chip 1Chip 2
Chip 3
Page 4
7Hot Chips 19 © Sun Microsystems, Inc.
Blocking Multi-stage Switch
...
...
...
...
...
...
...
...
1
288
1
288
S2,1
24x24
S2,12
24x24
S1,1
12x12
S3,1
12x12
S1,2
12x12
S1,24
12x12
S3,2
12x12
S3,24
12x12
� 36 switches
� 3 stages
� 576 internallinks
...
...
12
13
24
277
12
13
24
277
(Folded network combines S1 and S3)
8Hot Chips 19 © Sun Microsystems, Inc.
Non-blocking Multi-stage Switch
...
...
S2,1
24x24
...
...
...
...S
2,2
24x24
S2,24
24x24
S1,1
12x24
S3,1
24x12
S1,2
12x24
S1,24
12x24
S3,2
24x12
S3,24
24x12
� 72 switches
� 3 stages
� 1,152 internallinks
......
...
...
...
...
1
288
12
13
24
277
1
288
12
13
24
277
n m Expansion: m 2n -1
Page 5
9Hot Chips 19 © Sun Microsystems, Inc.
Proximity Communication Switch
...
...
...
...
...
...
S1,1
24x24
S1,2
24x24
S1,12
24x24
� 12 switches
� 1 stage
� PxC links
...
1
288
24
25
48
265
1
288
24
25
48
265
10Hot Chips 19 © Sun Microsystems, Inc.
Vector Multi-Chip Module
Switch Element(Island Chip)
PxC Link(Bridge Chip)
Off-Module IO(Wire Bonds)
Page 6
11Hot Chips 19 © Sun Microsystems, Inc.
Port-Sliced Crossbar Switch
Input Ports
Output ports
12Hot Chips 19 © Sun Microsystems, Inc.
Single-Stage PxC Switch Advantages
� Low deterministic latency
� Simple global scheduling> No internal blocking
> No out-of-sequence delivery
> Service guarantees possible
� Lower cost> Fewer switch elements
> Less internal wiring
� Less power
� Higher reliability
Page 7
13Hot Chips 19 © Sun Microsystems, Inc.
Switch Prototype Characteristics
� System characteristics> 4 x 10GE ports
> Layer2 switching
> Based on ATCA standard
> Off-the-shelf line cards
> Proprietary switch blade
� Switch fabric> "Vector switch" with 4 Island chips + 2
Bridge chips (3 PxC links)
> Off-chip connections through wire bonds
14Hot Chips 19 © Sun Microsystems, Inc.
Switch Prototype
Line Card
Switch Motherboard
Switch Daughtercard
Page 8
15Hot Chips 19 © Sun Microsystems, Inc.
PacketChecker
Switch Prototype Organization
PacketGenerator
PacketChecker
Encoder Decoder
Monitor
Flow Ctrl
Switch Fabric (4 x 4 Crossbar)
Line Card 1 Line Card 2 Line Card 3 Line Card 4
Monitor
Switch
16Hot Chips 19 © Sun Microsystems, Inc.
Bridge and Island Chips16 x 1 Gpbs DDR
3 x
16
x 1
Gp
bs D
DR
SER
PxC
Rx
16 x 1 Gpbs DDR
250 MHz
3 x
16
x 1
Gp
bs D
DR
DE
SD
ES
DE
S
DES
Island 1 Island 2 Island 3 Island 4
Bridge 1 Bridge 2
PxC
Rx
PxC
Rx
PxC
Tx
PxC
Tx
PxC
Tx
Page 9
17Hot Chips 19 © Sun Microsystems, Inc.
Bridge and Island Chips
11.8 mm
10.3
mm
22.9 mm
7.0
mm
Bridge Power
Alignment Measurement
PxC Tx Array
PxC Rx Array
Off-Chip IO
Island
Bridge
Alignment Marker
Switch Logic
Bridge Power
Process Technology:6-Layer Aluminum TSMC 0.18 m CMOS
18Hot Chips 19 © Sun Microsystems, Inc.
Vector Switch Prototype
Page 10
19Hot Chips 19 © Sun Microsystems, Inc.
Scaling Up
SwitchFabric
Element
SwitchFabric
Element
IngressEgressIngressEgress
IngressEgress
...
Electrical
Phy
Electrical
Phy
SwitchFabric
Element
IngressEgressIngressEgress
IngressEgress
Electrical
Phy
Electrical
Phy
PxC
IngressEgressIngressEgress
IngressEgress
Electrical
Phy
Electrical
Phy
SwitchFabric
Element
IngressEgressIngressEgress
IngressEgress
Electrical
Phy
Electrical
Phy
PxCSwitchFabric
Element
SwitchFabric
Element
SwitchFabric
Element
SwitchFabric
Element
IngressEgressIngressEgress
IngressEgress
OpticalPhy
OpticalPhy
IngressEgressIngressEgress
IngressEgress
Phy
Phy
IngressEgressIngressEgress
IngressEgress
OpticalPhy
OpticalPhy
IngressEgressIngressEgress
IngressEgress
OpticalPhy
OpticalPhy
IngressEgressIngressEgress
IngressEgress
OpticalPhy
OpticalPhy
PxC PxC...
PxC
PxC
PxC
PxC
PxC
PxC
PxC
PxC
PhyIngressEgress
PhyIngressEgress
PhyIngressEgress
PhyIngressEgress
PhyIngressEgress
PhyIngressEgress
PhyIngressEgress
PhyIngressEgress
SwitchFabric
Element
SwitchFabric
Element
SwitchFabric
Element
SwitchFabric
Element
SwitchFabric
Element
SwitchFabric
Element
SwitchFabric
Element
SwitchFabric
Element
...
...
...
...
...
...
...
...
PxC
PxC
PxC
PxC
PxC
PxC
PxC
PxC
PxC
PxC
PxC
PxC
PxC
PxC
PxC
PxC
256 Ports2.5 Tbps
1,024 Ports10 Tbps
4,096 Ports40 Tbps
20Hot Chips 19 © Sun Microsystems, Inc.
Scalable Switch Architecture
� "Output Buffered Switch with Input Groups“> Reduces memory requirements from O(n2) to
O(n � # Island Chips)
> To be presented at Globecom 2007
� "Parallel Wrapped Wave Front Arbiter"> Increases throughput of n x n Wrapped Wave Front
Arbiter by a factor of n
> Presented at HPSR 2007
Page 11
21Hot Chips 19 © Sun Microsystems, Inc.
Output Buffered Switch with Input Groups
... ... ... ... ... ...Arbiter Arbiter Arbiter
22Hot Chips 19 © Sun Microsystems, Inc.
Applications
� Data center backbone
� Blade system interconnect
� ATCA chassis aggregation
� Cluster interconnect
� System interconnect
Page 12
23Hot Chips 19 © Sun Microsystems, Inc.
Summary
� Proximity Communication allows forbuilding a flat single-stage switch fabricthat scales to thousands of ports andmultiple Tbps throughput> Low latency
> High efficiency
> Service guarantees
> Low power
> High physical density
Hans Eberle
[email protected]