Top Banner
Ziyi Zhu , Shijia Yan, Madeleine Strom Glick, Min Yee Teh, and Keren Bergman Lightwave Research Lab, Columbia University New York, NY, US Email: [email protected] Silicon Photonic Switch-Enabled Server Regrouping Using Bandwidth Steering for Distributed Deep Learning Training
11

Silicon Photonic Switch-Enabled Server Regrouping Using ...

May 06, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Silicon Photonic Switch-Enabled Server Regrouping Using ...

Ziyi Zhu, Shijia Yan, Madeleine Strom Glick, Min Yee Teh, and Keren Bergman

Lightwave Research Lab, Columbia University

New York, NY, US

Email: [email protected]

Silicon Photonic Switch-Enabled Server Regrouping UsingBandwidth Steering for Distributed Deep Learning Training

Page 2: Silicon Photonic Switch-Enabled Server Regrouping Using ...

Rev PA1Rev PA1 2

Motivation

Aggregation

Core

ToR

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Server

1 2 3 4

Servers

Logically grouped servers

Electrical Packet Switch (EPS)

5 6

7

• Under the top-of-rack (ToR) switch the full bandwidth can be utilized, but across racks

constrained bandwidth are experienced

• Distributed deep learning workloads can require many server nodes and show strong

communication patterns between these nodes

Page 3: Silicon Photonic Switch-Enabled Server Regrouping Using ...

Rev PA1Rev PA1 3

Motivation

SiP OCSBandwidth Steering

Above the ToR

Aggregation

Core

ToR

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Server

1 2 3 4

5 6

7

• Previous work [1-2] show silicon photonic (SiP) based bandwidth steering has the capability ofmitigating the bottleneck at the core network level

• However, it is not ideal and does not improve the job locality

Servers

Logically grouped servers

EPS

OCS: Optical Circuit Switch

[1] Michelogiannakis, George, et al. "Bandwidth steering in HPC using silicon nanophotonics." Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2019.[2] Shen, Yiwen, et al. "Accelerating of high performance data centers using silicon photonic switch-enabled bandwidth steering." 2018 European Conference on Optical Communication (ECOC). IEEE, 2018.

Page 4: Silicon Photonic Switch-Enabled Server Regrouping Using ...

Rev PA1Rev PA1 4

MotivationFixed servers

Regrouped servers

SiP OCS

SiP OCS

SiP OCS

Bandwidth SteeringAbove the ToR

Server Regrouping

Aggregation

Core

ToR

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Server

1 2 3 4

EPS

5 6

7

• In this work, the SiP OCSs are proposed to be also inserted between the ToR EPSs and

servers

• SiP-enabled bandwidth steering above the ToR switches can still be applied when the

port count of the OCS is limited

Page 5: Silicon Photonic Switch-Enabled Server Regrouping Using ...

Rev PA1Rev PA1 5

Silicon Photonic Switches and Switching

2 x 2 x 2λ Microring-assisted space-and-

wavelength selective switch, reprinted from [3]

64 x 64 Mach–Zehnder interferometer

(MZI)-based switch, reprinted from [4]

240 x 240 switch implemented by micro-

electromechanical system (MEMS)-actuated

directional couplers, reprinted from [5]

• CMOS compatible manufacturing processes

• Small footprint

• Promise for power-efficient and low fabrication cost

interconnects

• Various switching types – spatial, wavelength selective,

space-and-wavelength selective* Demonstrate and develop the control plane of SiP based switching for datacenter

networks

[3] Huang, Yishen, et al. "Push—pull microring-assisted space-and-wavelengthselective switch." Optics letters 45.10 (2020): 2696-2699.[4] Chu, Tao, et al. "Fast, high-radix silicon photonic switches." 2018 OpticalFiber Communications Conference and Exposition (OFC). IEEE, 2018.[5] Seok, Tae Joon, et al. "Wafer-scale silicon photonic switches beyond die sizelimit." Optica 6.4 (2019): 490-494.

Page 6: Silicon Photonic Switch-Enabled Server Regrouping Using ...

Rev PA1Rev PA1 6

Silicon Photonic Switch Control

• GPIO: Configuration and trigging bits• Linux/Ubuntu: Xilinx PetaLinux, Ubuntu FS, UIO Drive, TCP/IP• Custom SiP switch daughter card*: Co-packaged DAC/ADC circuitry, SiP switches, and fiber arrays -

interfaced through FMC connectors.

Page 7: Silicon Photonic Switch-Enabled Server Regrouping Using ...

Rev PA1Rev PA1 7

Control Plane Workflow

Network Optimization• Server regrouping• Bandwidth steering above the ToR

Topology Management• Link Establishment• Link Removal

Job/Traffic Requirements

• Logically grouped servers• Link Monitoring

Electronic Packet Switches SiP Switch/Network Controllers

Flow Update (OpenFlow)Reconfiguration Request

Network Control Plane

Data Plane

Page 8: Silicon Photonic Switch-Enabled Server Regrouping Using ...

Rev PA1Rev PA1 8

Distributed Deep Learning Training

Synchronized Training: Ring allreduce Asynchronized Training: Parameter server and workers

torch.distributed: initialize the process group, ip[rank0]: portmodel.to(device) # GPUtorch.nn.parallel.DistributedDataParallel: Gradients synchronization communications

M

M M

M

Rank 0

Rank 1

Rank 2

Rank 3

Neural networks: VGG [6], Dataset: Imagenette (https://github.com/fastai/imagenette)

[6] Simonyan, Karen, et al. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).

G M

M M

G G

W W

Rank 0, Rank 0’

Rank 1 Rank 1’torch.multiprocessing: multiple process groups

model.to(device) # GPU; model.share_memory() # shared

modeltorch.distributed: initialize a process group, ip[rank0]: port

initialize another process group, ip[rank0’]: porttorch.distributed.broadcast: distribute weights from PS to workertorch.distributed.reduce: collect gradients from worker to PS

M: MachineW: WeightsG: Gradients

GG

G

Page 9: Silicon Photonic Switch-Enabled Server Regrouping Using ...

Rev PA1Rev PA1 9

Testbed – Baseline, Server Regrouping, and Bandwidth Steering

5 6 9 10

Baseline

Aggregated

EPS1 EPS2 EPS3 EPS4

EPS5 EPS6

EPS7

Server Regrouping + Bandwidth Steering Above the ToR

5 6 9 10

SiP OCS

Released

SiP OCS

EPS1 EPS2 EPS3 EPS4

EPS5 EPS6

EPS7

Server Regrouping

5 6 9 10

SiP OCS

Released

EPS1 EPS2 EPS3 EPS4

EPS5 EPS6

EPS7

Electronic Packet Switches

Servers

GPU Servers

Page 10: Silicon Photonic Switch-Enabled Server Regrouping Using ...

Rev PA1Rev PA1 10

Experimental Setup

Configuration 2

12

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Wavelength (nm)P

ow

er

(-d

Bm

)

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Po

we

r (-

dB

m)

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Wavelength (nm)

Po

we

r (-

dB

m)

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Po

we

r (-

dB

m)

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Wavelength (nm)

Po

we

r (-

dB

m)

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Po

we

r (-

dB

m)

RX1

RX4

RX2

RX5

RX3

RX6

Configuration 1

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Wavelength (nm)

Po

we

r (-

dB

m)

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Po

we

r (-

dB

m)

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Wavelength (nm)

Po

we

r (-

dB

m)

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Po

we

r (-

dB

m)

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Wavelength (nm)

Po

we

r (-

dB

m)

1544 1546 1548 1550 1552 1554 1556

-20

-30

-40

-50

-60

-70

Po

we

r (-

dB

m)

RX1

RX4

RX2

RX5

RX3

RX6

PC – Polarization Controller EDFA – Erbium-Doped Fiber AmplifierDUX – Optical MultiplexerAMP – Electrical AmplifierDAC – Digital to Analog Converterλ1 = 1545.32nm λ2 = 1546.92nm λ3 = 1553.33nmλ4 = 1554.94nm λ5 = 1554.94nm λ6 = 1556.55nm

Page 11: Silicon Photonic Switch-Enabled Server Regrouping Using ...

Rev PA1Rev PA1 11

Acknowledgement:

This work was partly supported by the U.S. Department of Energy (DoE) SBIR Photonic-Storage Subsystem Input/Output (P-SSIO) Interface Project, by Advanced Research Projects Agency-Energy (ARPA-E) under the Enlightened Project, and by National Security Agency (NSA) Laboratory for Physical Sciences (LPS) Research Initiative

Thank you

[email protected]