Top Banner
Enabling Wide-spread Communications on Optical Fabric with MegaSwitch Li Chen Kai Chen, Zhonghua Zhu, MinlanYu, George Porter, Chunming Qiao, Shan Zhong
31

Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Aug 06, 2018

Download

Documents

tranthien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Enabling Wide-spread Communications on Optical Fabric with MegaSwitch

Li Chen

Kai Chen, Zhonghua Zhu, MinlanYu, George Porter, Chunming Qiao, Shan Zhong

Page 2: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Optical Networking in Data Centers

• Data center traffic demand is growing

3/29/17 SING-CSE-HKUST 2

Optical Fabric

Jupiter Rising [Sigcomm’15]

• Optical networking in data centers• Low cost• Low power consumption• Low wiring complexity• High one-to-one bandwidth

How to design an optical fabric that enables high bisection bandwidth?

Page 3: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

• Optical fabric usually provides one-to-one high-bandwidth circuits.• Data Center traffic is wide-spread

Optical Networking in Data Centers

3/29/17 SING-CSE-HKUST 3

How to design an optical fabric that supports high-bandwidth & wide-spread traffic?

Microsoft Data Center Network [ProjecToR, Sigcomm’16]

Page 4: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Prior Works

• 2010: C-Through, Helios• 2012: OSA• 2013: Mordia, ReacToR

3/29/17 SING-CSE-HKUST 4

Prior works reuse wavelengths temporally to meet traffic demand

t1 t2 time

Schedule 1 Schedule 2 Schedule 3

1 2 3

1 - 1 0

2 0 - 1

3 1 0 -

1 2 3

1 - 0 1

2 1 - 0

3 0 1 -

1 2 3

1 - 0 1

2 0 - 0

3 1 0 -1 2 3

1 - 1 1

2 1 - 1

3 2 1 -

Traffic Demand Matrix

Src

Dest

Wavelength/Circuit assignments

BvN decomp.

Page 5: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

High Bisection Bandwidth + Wide-Spread Connectivity

• 2010: C-Through, Helios• 2012: OSA• 2013: Mordia, ReacToR

3/29/17 SING-CSE-HKUST 5

Prior works take several rounds to meet a wide-spread demand

MegaSwitch: Meet a wide-spread demand simultaneously

t1 t2 time

Schedule 1 Schedule 2 Schedule 3

timeParallel circuits

Circuit 1 Circuit 2 Circuit 3

…using MegaSwitch

Page 6: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

MegaSwitch Data PlaneEnabling Spatial Reuse of WavelengthPrototype implementation

3/29/17 SING-CSE-HKUST 6

Page 7: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Multiplexer

• MUX: k input wavelengths, 1 output fiber

4x1MUX

3/29/17 SING-CSE-HKUST 7

Page 8: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Demultiplexer

• DEMUX: 1 input fiber, k output wavelengths

3/29/17 SING-CSE-HKUST 8

1x4DEMUX

Page 9: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Wavelength Selective Switch

• WSS: w input fibers, 1 output fiber• Same set of wavelengths can be reused on different fibers.

4x1WSS

3/29/17 SING-CSE-HKUST 9

Fiber 1

Fiber 2

Fiber 3

Fiber 4

Page 10: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Wavelength Selective Switch

• WSS: w input fibers, 1 output fiber• Same set of wavelengths can be reused on different fibers.

4x1WSS

3/29/17 SING-CSE-HKUST 10

Fiber 1

Fiber 2

Fiber 3

Fiber 4

Page 11: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Sending

3/29/17 SING-CSE-HKUST 11

ToR 1 ToR 2 ToR 3

Send using k wavelengths

Each ToR sends on its own fiber

Page 12: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Receiving

3/29/17 SING-CSE-HKUST 12

Recv from w other nodes

Select k wavelengths from w●k wavelengths

ToR 1 ToR 2 ToR 3

Page 13: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

MegaSwitch: Full 3-Node Example

3/29/17 SING-CSE-HKUST 13

ToR 1 ToR 2 ToR 3

Recv from w other nodes

Send using k wavelengths

Select k wavelengths from w●k wavelengths

Each ToR sends on its own fiber

Page 14: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

MegaSwitch: Scalability

3/29/17 SING-CSE-HKUST 14

Recv from w other nodes

Send using k wavelengths

Select k wavelengths from w●k wavelengths

Each ToR sends on its own fiber

ToR 1 ToR 2 ToR 3

Key parameters: w = 2, n = w+1 =3, k =4, Port Count = n x k

# nodes on ring

WSS port count # wavelength on a fiber

With current technology, MegaSwitch can scale to n x k = 33 x 192 = 6336 Ports

Page 15: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Unicast from H1 to H12

1, Control plane select Blue as the wavelength for the unicast

3/29/17 SING-CSE-HKUST 15

Page 16: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Unicast from H1 to H12

2, Configure WSS in Node 3 to select Blue in Fiber from Node 11, Control plane select Blue as the wavelength for the unicast

3/29/17 SING-CSE-HKUST 16

Page 17: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Unicast from H1 to H12

3, Setup routing in both EPSes2, Configure WSS in Node 3 to select Blue in Fiber from Node 11, Control plane select Blue as the wavelength for the unicast

3/29/17 SING-CSE-HKUST 17

Page 18: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Multicast from H1 to H5, H6, H7, and H10

3, Setup routing in both EPSes

2, Configure WSS in OWS2 and OWS3 to select Red in Fiber from OWS1

1, Control plane select Red as the wavelength for the multicast

3/29/17 SING-CSE-HKUST 18

Page 19: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

MegaSwitch: Full 3-Node Example

3/29/17 SING-CSE-HKUST 19

OWS+PRF box OWS+PRF box OWS+PRF box

Page 20: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

OWS+PRF box

Prototype Implementation

• Implemented OWS+PRF box for practical deployment.• Implemented prototype with 40 × 10Gbps

Ports • 5 nodes (OWS-EPS)• 8 wavelengths per node

3/29/17 SING-CSE-HKUST 20

Page 21: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

High Port Count & Low Switching Latency

• Lowest WSS switching latency reported: 11.5us [Mordia, Sigcomm’13]

• Digital Light Processing (DLP) technology.• Cannot scale beyond 8 ports with 11.5us switching

• MegaSwitch need a large WSS port count to scale to more ports• Liquid Crystal tech. is a middle-ground in terms of both port count (10~100s) and

switching latency (milliseconds).• Measured WSS switching latency: ~3ms

• Milliseconds switching latency is a hard limit for now.• Optics community are working on it…

• How to mitigate impact to short flows?

3/29/17 SING-CSE-HKUST 21

…cannot be achieved at the same time…

Page 22: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

MegaSwitch Control PlaneBasemesh for latency-sensitive applications

3/29/17 SING-CSE-HKUST 22

Page 23: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Basemesh

• Problem:• …when traffic matrix changes quickly• …when traffic matrix is estimated incorrectly

• Basemesh: a flexible overlay network on MegaSwitch to provide consistent connectivity for low latency traffic. • Each node dedicates b wavelengths to construct an overlay network on fully

connected fiber mesh.

3/29/17 SING-CSE-HKUST 23

Dynamic Allocation

BasemeshConnectivity

b wavelengths

k-b wavelengths

Page 24: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Basemesh: Learning from DHT literature

• MegaSwitch uses Symphony [USITS’ 13] DHT topology for basemeshconstruction. • b (“routing table size”) is adjustable for varying degree of traffic volatility • Guaranteed average latency (“average hop count per look-up”)

3/29/17 SING-CSE-HKUST 24

Basemesh b=5, Avg Hops =1Fully connected mesh network (w=5)Recommended for low latency apps

Basemesh b=3Avg Hops = 1.4

Basemesh b=1Avg Hops = 2.5

Page 25: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

EvaluationsTestbed benchmarkReal application deployments

3/29/17 SING-CSE-HKUST 25

Page 26: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Prototype Evaluation

• Setting:• 5 nodes (OWS-EPS pair) • 8 wavelengths per node• 8 servers per node• Out-of-band control plane for EPS and OWS• Traffic demand matrices are known

3/29/17 SING-CSE-HKUST 26

Page 27: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Basic measurements

• Host-level stride• All-to-all pattern [Helios, Sigcomm’10]

• Every 10s, one wavelength changes per rack.

• Measured ~20ms total reconfiguration delay.• WSS (~3ms), EPS routing (~5ms),

transceiver initialization (~10ms)…

3/29/17 SING-CSE-HKUST 27

~50Gbps

MegaSwitch achieves full-bisection bandwidth when circuit is stable

Page 28: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Redis on MegaSwitch

• Latency-sensitive application• 1 Million GET/SET requests from all nodes to a server in Node1

3/29/17 SING-CSE-HKUST 28

Fully connected basemesh à Uniform latency (one-hop)

32 30 32

142

9167

12789

6692

68 6767 67 68

0

50

100

150

1 2 4

Mic

rose

cond

s

Number of Basemesh Wavelengths (b)

Average Query Completion Time

Node 1

Node 2

Node 3

Node 4

Node 5

Page 29: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Apache Spark on MegaSwitch

• Parallel computing applications• First connect the servers to a

single ToR switch, and measure the bandwidth demand • MegaSwitch updates wavelength

assignment every 1 sec.• <10 reconfigurations in run-time.

3/29/17 SING-CSE-HKUST 29

MegaSwitch performs similar to the optimal scenario: all servers in the same rack

212

320

415

218

336

423

0

100

200

300

400

500

KMeans WordCount WikiPageRank

Seco

nds

Job Completion Time

Single ToR MegaSwitch (1s)

Page 30: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Summary

• Spatial reuse of wavelengths to provide non-blocking connectivity for all ports.• Basemesh to provide consistent connectivity to latency-sensitive

flows.• Practical implementation of a 5-node ring of 40 ports.

MegaSwitch: An optical design that supports wide-spread, high-bandwidth traffic patterns in today’s production workloads.

More in our paper: Fault-tolerance, delay measurements, power budget, cost…

3/29/17 SING-CSE-HKUST 30

Page 31: Enabling Wide-spread Communications on Optical Fabric … · Enabling Wide-spread Communications on Optical Fabric with MegaSwitch ... 3/29/17 SING-CSE-HKUST 2 ... [Sigcomm’15]

Got New Ideas for NSDI’18?Test them in APNet’17!

• The first Asia-Pacific Workshop on Networking• Aug. 3-4, 2017 @Hong Kong

• A good venue to test your innovative ideas and get feedback from the community.• Submit your 6-page paper on/before Apr. 21th, 2017

3/29/17 SING-CSE-HKUST 31