Top Banner
© 2009 IBM Corporation © 2013 IBM Corporation MEMS Optical Switching in the Datacenter Silicon Photonics for Next Generation Computing Systems HiPEAC Computer Systems Week October 2013 Kostas Katrinis – IBM Research, Ireland
31

Optical Switching in the Datacenter

May 06, 2015

Download

Technology

Kostas Katrinis

Glimpse of advanced R&D on introducing optical switches as datacenter interconnect elements
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Optical Switching in the Datacenter

© 2009 IBM Corporation

© 2013 IBM Corporation

MEMS Optical Switching in the Datacenter

Silicon Photonics for Next Generation Computing SystemsHiPEAC Computer Systems WeekOctober 2013

Kostas Katrinis – IBM Research, Ireland

Page 2: Optical Switching in the Datacenter

2

Outline

● Scope & Background● Motivation & Challenges● Hybrid Network Architecture● Data Plane● Control-Plane● System Evaluation● Use Cases● Conclusion

Part I - Introduction

Part II – Arch & Tech

Part III – Evaluation

Part IV – Use Cases

Page 3: Optical Switching in the Datacenter

3

ScopePart I - Background

● Target Markets● (Cloud) Datacenters - Θ(10K) Servers● HPC Clusters (82% in Nov'12 Top-500)

● Target Systems:● Data Network Fabric

Page 4: Optical Switching in the Datacenter

4

DC Traffic TrendsPart I - Background

● 76% of the traffic is

intra-datacenter *● Total DC traffic CAGR

33% to 2015 *● Traffic percentage exiting the rack

is high (up to 90%) **● ...and we expect it to increase (scale-out workloads)

* Cisco Global Cloud Index: Forecast and Methodology, 2011–2016** Benson et al., “Network Traffic Characteristics of Data Centers in the Wild”, IMC'10

Page 5: Optical Switching in the Datacenter

5

Design Trade-offs (Performance)

Performance

Co

st

Highlyoversubcr.

HighBisection

Trad

e-off

Part I - Background

● We need high-capacity between any two points in the DC

● and at various scales (incremental deployment)● ... we need $$$

Page 6: Optical Switching in the Datacenter

6

Motivating Example

Item Item List Price (USD)** Qty Total List Price (USD)

BNT G8264 (64-port switch) 30,000 5,120 153,600,000

BNT SFP+ SR Transceiver 665 262,164 174,325,760

MM Fiber Cable 28 131,072 3,670,016

Estimated List Prices

**Source: ibm.com

Fabric Price331M USD

≈ Compute Price(@5K/server)

Part I - Motivation

● Full-bisection fat-tree @ 65k servers

● Building block: 64-port Ethernet switches (ala VL2*)

● Denser switches will not help you (e.g. 288-port Mellanox Vantage)

Greenberg et al., “VL2: A Scalable and Flexible Data Center Network”, SIGCOMM 2009

Page 7: Optical Switching in the Datacenter

7

Motivating Example (cont.)

Total Price331m USD

AND.....

#Cables to route131,000

Can you count the birds in the nest?

Part I - Motivation

Page 8: Optical Switching in the Datacenter

8

Paradigm Shift - Switch Light

Tiltable Mirrors implemented via MEMS (Micro-Electrical Mechanical Systems)

+ High-radix (320 ports you can buy, 1024 feasible)

+ No transceivers

+ Decreasing $/port

+ 50x less Watts/port vs. electronics

+ Can switch up to ~1Tbps

+ Protocol Agnostic

Electronic Switch (Ethernet) Optical MEMS switch

Price/Port (USD) 1100 (includes TxRx cost) 350

Bandwidth/Port 10Gbps “Rate-free”

Power/Port (W) 10 0.2

Requires TxRxs Yes No

x3

x∞

x50

Part I - Motivation

Page 9: Optical Switching in the Datacenter

9

MEMS Switch in the DCPart I - Challenges

● Repurposing is not free:● 10-200ms switching latency vs. sub-μsec Ethernet

switch (point-to-point “circuits”)● L2 spanning-tree forwarding bad option for ROI

(applies to electronic redundant topologies too!)● Traffic Engineering (becomes dynamic Topology

Management?) is important● Collectives?

Page 10: Optical Switching in the Datacenter

10

Related Approaches

Codename Affiliation Targets Working Prototype Comments

Helios UCSD/Google HPC/DC Yes First-principles, lacking integration, no edge routing, supporting

infrastructure (e.g. monitoring)

c-Through CMU/Rice/Intel DC No (Emulation) Reconfiguration algorithms, traffic splitting, problems not addressed

at scale

OSA (previously

Proteus)

Northwestern/UIUC/NEC DC Yes (with Wavelength-Division Multiplexing)

Mostly pursuing multiple wavelengths/fiber

Plexxi Plexxi DC Product offering Not a re-configurable architecture, low-bisection ring between racks

Part II – Architecture

Page 11: Optical Switching in the Datacenter

11

Hybrid Fabric ArchitecturePart II – Architecture

Page 12: Optical Switching in the Datacenter

12

High-level FunctionalityPart II – Architecture

● Bijective TE:● Mice are routed via the 1G electronic fabric● Elephants are routed via the 10G optical fabric

● Optical Fabric is reconfigurable● Centralized control optimizes topology against

traffic pattern and demand volume

Page 13: Optical Switching in the Datacenter

13

Multi-hop & Multi-path Data PlanePart II – Data Plane

● Our simulation work showed that multi-hop reduces overhead of slow switching latency● Relaxes the impact of slowly movable p2p circuits● Larger topology space (not just bipartite graphs)

● Multi-path as throughput booster (utilization)

Multi-hop: Rack-2 reaches Rack-4 via Rack-3 TOR switch

Page 14: Optical Switching in the Datacenter

14

VLAN-based Forwarding● Routing over 802.1p overlays

● TOR ports along a multi-hop path are assigned the same VLAN-ID

● Paths “touching” common TOR switch(es) use distinct VLAN-IDs

● Dynamic VLAN-ID assignment/revoking via central controller

Part II – Data Plane

Page 15: Optical Switching in the Datacenter

15

Server-based Path Selection Part II – Data Plane

● OVS based

● Mice flows per default via eth0, elephant flowspec pushed by the controller to OVS

Page 16: Optical Switching in the Datacenter

16

VLAN forwarding - A bird-eye view

● Not clean: re-purposing a feature to cancel another feature (spanning-trees)

● Not infinitely scalable (4094 IDs)● Server support is off datacenter

provider/networking vendor premise in some models (e.g. IaaS)● Tenant is the master of the server

● VLAN tagging is slow (coming up...)

Part II – Data Plane

Page 17: Optical Switching in the Datacenter

17

VLANs vs. Openflow PerformancePart II – Data Plane

● All measurements at IBM G8264 (7.6.1 firmware)

● At 32 ports switching, OF is 2x faster

● VLAN tagging latency has a 700ms “DC” component

● OF support is work-in-progress

802.1p Openflow

Page 18: Optical Switching in the Datacenter

18

Controller LoopPart II – Control Plane

Page 19: Optical Switching in the Datacenter

19

Dynamic Topology Management

● Input:● Traffic Matrix (bytes)● Optical physical topology● Circuit state (used/not-used)

● Output:● Optical Topology (optical cross-connections)● Mapping of multi-hop paths to circuits

● Goal:● Maximize optical throughput (volume of TM routed

optically)

Part II – Control Plane

Page 20: Optical Switching in the Datacenter

20

Dynamic Topology Mgmt Algorithms

● Showed that the problem is NP-complete (reduction to circular arrangement problem)

● Heuristic approaches:● High-Demand First (HDF): cluster demand based

on proximity and fit as much demand as possible to optical fabric available capacity

● Simulated-Annealing (SA): couple HDF loops with SA optimization

● ILP modelling for optimality sense at lower scale

Part II – Control Plane

Page 21: Optical Switching in the Datacenter

21

Topology Mgmt Algos Evaluation

● Hop-bytes as throughput measure here (lower is better)

● SA-100 best in throughput vs. performance trade-off

Part II – Control Plane

Page 22: Optical Switching in the Datacenter

22

Cost Competitiveness● Comparison vs. fat-tree at various over-subscription levels

(parameter β)

● Hybrid is 30% cheaper at full-bisection

● Competitiveness diminishes but hybrid is a winner throughout

Part III – Cost Eval.

Page 23: Optical Switching in the Datacenter

23

Proof-of-Concept Prototype Part III – Perf. Eval.

Page 24: Optical Switching in the Datacenter

24

Evaluation Scenarios

● 4 racks, 40 servers (10 servers/rack)● Equi-cost comparisons vs. fat-tree

● For a given hybrid network setup (parameter β), evaluate application performance against electronic fat-tree

● HPC Workload Input● NAS Parallel Benchmarks● FFTW

Part III – Perf. Eval.

Page 25: Optical Switching in the Datacenter

25

Evaluation Results Set-1

● Comparison vs. 1:25 fat-tree● 25% improvement for most workloads● At least as good for 2 cases

Part III – Perf. Eval.

Page 26: Optical Switching in the Datacenter

26

Evaluation Results Set-2

● Comparison vs. 1:4 fat-tree● Up to 35% improvement● At least as good for 2 cases

Part III – Perf. Eval.

Page 27: Optical Switching in the Datacenter

27

Further “Killer” Use-casesPart IV – Use-cases

● HPC workloads are challenging (collectives, dynamic)

● We are working on integrating and evaluating:● Data-intensive (Big Data) frameworks (Hadoop)● Massive VM migration● Checkpointing

● ...on-going

Page 28: Optical Switching in the Datacenter

28

Conclusions● Hybrid optical/electrical networks are cost-

competitive● Results show that performance is not degraded

(to say the least)● Edge engineering burden is not necessarily less

than routing/flow scheduling in electronic fat-tree● Main Challenges Ahead:

● SDN edge● Bring Traffic Engineering/Topology Management

closer to the application● Optical performance in multi-stage optical setups● More use-cases to increase confidence/persuasion

Part IV – Conclusions

Page 29: Optical Switching in the Datacenter

29

Results Publication Diego Lugones, Konstantinos Christodoulopoulos, Kostas Katrinis, Marco Ruffni, Donal O'Mahony, and Martin Collier,"Accelerating communication-intensive parallel workloads using commodity optical switches and a software-configurable control stack”, in Proceedings of the 2013 International European Conference on Parallel and Distributed Computing (Euro-Par 2013), Aachen, Germany, August 2013 Kostas Katrinis, Guohui Wang and Laurent Schares, "SDN control for hybrid OCS/electrical datacenter networks:an enabler or just a convenience?", in Proceedings of the 2013 IEEE Summer Topicals, IEEE Photonics Society , Hawai, USA, July 2013 Konstantinos Christodoulopoulos, Kostas Katrinis, Marco Ruffini and Donal O’Mahony, "Tailoring the Network to the Problem: Topology Configuration in Hybrid EPS/OCS Interconnects", in CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Journal, Wiley Interscience, invited article (in press) Diego Lugones, Kostas Katrinis, Martin Collier and Georgios Theodoropoulos, "Parallel Simulation Models for the Evaluation of Future Large-Scale Datacenter Networks", in Proceedings of the 16th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications, Dublin, Ireland, October 2012 Konstantinos Christodoulopoulos, Marco Ruffini, Donal O’Mahony and Kostas Katrinis, "Topology Configuration in Hybrid EPS/OCS Interconnects", in Proceedings of the 2012 International European Conference on Parallel and Distributed Computing (Euro-Par 2012), Rhodes Island, Greece, August 2012 (Distinguished Paper Award) Diego Lugones, Kostas Katrinis and Martin Collier, "A Reconfigurable Optical/Electrical Interconnect Architecture for Large-scale Clusters and Datacenters", in Proceedings of the ACM International Conference on Computing Frontiers (CF '12), Cagliari, Italy, May 2012 (Best Paper Award)

Page 30: Optical Switching in the Datacenter

30

Dr. Diego Lugones (co-worker)

Dr. Martin Collier (co-author)

Dr. K Christodoulopoulos (co-worker)

Dr. Marco Ruffini (co-author)

Prof. Dr. Donal O'Mahony (co-author)

Trinity CollegeDublin

Dublin CityUniversity

Credit

Page 31: Optical Switching in the Datacenter

31

THANK YOU!

Q&A