ElasticSwitch : Practical Work-Conserving Bandwidth Guarantees for Cloud Computing

ElasticSwitch: Practical Work-Conserving Bandwidth Guarantees for Cloud

Computing

HP Labs* Avi Networks+ Google

Lucian Popa Praveen Yalagandula* Sujata Banerjee Jeffrey C. Mogul+ Yoshio Turner Jose Renato Santos

1. Provide Minimum Bandwidth Guarantees in Clouds

Tenants can affect each other’s traffic MapReduce jobs can affect performance of user-facing

applications Large MapReduce jobs can delay the completion of small

jobs Bandwidth guarantees offer predictable

performance

VMs of one tenant

Hose model

Bandwidth Guarantees

Virtual (imaginary) Switch

BX BYZ

Other models based on hose model such as TAG [HotCloud’13]

2. Work-Conserving Allocation Tenants can use spare bandwidth from unallocated or

underutilized guarantees

2. Work-Conserving Allocation Tenants can use spare bandwidth from unallocated or

underutilized guarantees Significantly increases performance

Average traffic is low [IMC09,IMC10] Traffic is bursty

2. Work-Conserving Allocation

ElasticSwitch

Free capacity

Everything reserved & used

2. Work-Conserving Allocation3. Be Practical

Topology independent: work with oversubscribed topologies

Inexpensive: per VM/per tenant queues are expensive work with commodity switches

Scalable: centralized controller can be bottleneck distributed solution Hard to partition: VMs can cause bottlenecks anywhere in

the network

1. Provide Minimum Bandwidth Guarantees in Clouds2. Work-Conserving Allocation3. Be Practical

Prior WorkGuarante

esWork-

conserving

Practical

Seawall [NDSI’11], NetShare [TR], FairCloud (PS-L/N) [SIGCOMM’12]

X (fair

sharing)√ √

SecondNet [CoNEXT’10] √ X √

Oktopus [SIGCOMM’10] √ X ~X (centralized)

Gatekeeper [WIOV’11], EyeQ [NSDI’13] √ √ X (congestion-

free core)

FairCloud (PS-P) [SIGCOMM’12] √ √ X (queue/VM)

Hadrian [NSDI’13] √ √ X (weighted RCP)

ElasticSwitch √ √ √

Outline Motivation and Goals Overview More Details

Guarantee Partitioning Rate Allocation

Evaluation

ElasticSwitch Overview: Operates At Runtime

Tenant selects bandwidth guarantees. Models: Hose,

TAG, etc.

VMs placed, Admission Control ensures all

guarantees can be met

Enforce bandwidth guarantees

& Provide work-conservation

VM setup

RuntimeElasticSwitch

Oktopus [SIGCOMM’10]Hadrian [NSDI’10]CloudMirror [HotCLoud’13]

Network

ElasticSwitch Overview: Runs In Hypervisors

ElasticSwitchHypervisor

• Resides in the hypervisor of each host• Distributed: Communicates pairwise following

data flows

ElasticSwitch Overview: Two Layers

Guarantee Partitioning

Rate Allocation

Hypervisor

Provides Work-conservation

Provides Guarantees

ElasticSwitch Overview: Guarantee Partitioning1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees

VM-to-VM control is necessary, coarser granularity is not enough

BX BY BZ

ElasticSwitch Overview: Guarantee Partitioning1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees

BXY BXZVM-to-VM guarantees bandwidths as if tenant communicates on a physical

hose network

Intra-tenant

ElasticSwitch Overview: Rate Allocation1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees

RateXY ≥

2. Rate Allocation: uses rate limiters, increases rate between X-Y above BXY when there is no congestion between X and Y

Unreserved/Unused Capacity

Work-conserving allocation

BX BY BZ

Inter-tenant

Hypervisor Limiter Hypervisor

ElasticSwitch Overview: Periodic Application

Rate Allocation

Hypervisor

VM-to-VMguarantees Demand

estimates

Applied periodically, more often

Applied periodically and onnew VM-to-VM pairs

Evaluation

Guarantee Partitioning – Overview

BXZ BTY

Goals:A. Safety – don’t violate hose

modelB. Efficiency – don’t waste

guaranteeC. No Starvation – don’t block

traffic

Max-min allocation

Guarantee Partitioning – Overview

TBX = … = BQ = 100Mbps

33Mbps66Mbps 33Mbps

33Mbps

Max-min allocationGoals:

A. Safety – don’t violate hose model

B. Efficiency – don’t waste guarantee

C. No Starvation – don’t block traffic

Guarantee Partitioning – Operation

BYQYHypervisor divides guarantee of each

hosted VM between VM-to-VM pairs in each direction

Source hypervisor uses the minimum between the source and destination

guarantees

X YBXY = min( BX , BY )

Guarantee Partitioning – Safety

BXY = min( BX , BY )

Safety: hose-model guarantees are not exceeded

BX = 50 BY = 33BXY = 33BX =

50BY = 33

BY = 33QY

Guarantee Partitioning – Efficiency

What happens when flows have low demands?1

BX = 50 BY = 33BXY = 33BX =

50BY = 33

BY = 33QY

Hypervisor divides guarantees max-min based on demands

(future demands estimated based on history)

2How to avoid unallocated guarantees?

BX = 50 BY = 33BXY = 33BX =

50BY = 33

BY = 33QY

What happens when flows have low demands?1

BX = 50 BY = 33BXY = 33BX =

50BY = 33

BY = 33QY

TBX = … = BQ = 100Mbps 6

33Source considers destination’s

allocation when destination is bottleneck

Guarantee Partitioning converges

Evaluation

Rate Allocation

Congestion data

Rate RXY

Spare bandwidth

Fully used

Limiter

Rate Allocation

Congestion data

Rate RXYLimite

RXY = max(BXY , RTCP-like)

Rate AllocationRXY = max(BXY , RTCP-like)

144.178 144.678 145.178 145.678 146.178 146.678 147.178 147.678 148.1780100200300400500600700800900

1000Rate Limiter Rate

Seconds

Guarantee

Another Tenant

Rate AllocationRXY = max(BXY , Rweighted-TCP-

Weight is the BXY guarantee

BXY = 100Mbps

Z TBZT = 200Mbps

L = 1GbpsRXY = 333Mbps

RXT = 666Mbps

Rate Allocation – Congestion Detection

Detect congestion through dropped packets Hypervisors add/monitor sequence numbers in

packet headers

Use ECN, if available

Rate Allocation – Adaptive Algorithm

Use Seawall [NSDI’11] as rate-allocation algorithm TCP-Cubic like

Essential improvements (for when using dropped packets)

Many flows probing for spare bandwidth affect guarantees of others

Rate Allocation – Adaptive Algorithm

Use Seawall [NSDI’11] as rate-allocation algorithm TCP-Cubic like

Essential improvements (for when using dropped packets) Hold-increase: hold probing for free bandwidth after a

congestion event. Holding time is inversely proportional to guarantee.

GuaranteeRate increasing

Holding time

Evaluation

Evaluation Setup

Implementation in Linux Logic in user-space: controls rate limiters, sends

control packets Modified kernel OVS

Testbed ~100 servers 1Gbps tree network

Evaluation – Many-to-one

XL = 1Gbps

450Mbps

Edge or core

0 1 2 10 100 200 3000

100200300400500600700800900

Evaluation – Many-to-oneTh

Senders to Z

0 1 2 10 100 200 3000

100200300400500600700800900

Senders to Z

VM Z takes all the bandwidth

No Protection

0 1 2 10 100 200 3000

100200300400500600700800900

Senders to Z

No ProtectionStatic Reservation (e.g., Oktopus)

Wasted bandwidth

0 1 2 10 100 200 3000

100200300400500600700800900

Senders to Z

ElasticSwitch

Work-conserving

0 1 2 10 100 200 3000

100200300400500600700800900

Senders to Z

ElasticSwitch

Provides guarantees

0 1 2 10 100 200 3000

100200300400500600700800900

Senders to Z

ElasticSwitch Ideal behavior

Evaluation – MapReduce Setup

44 servers, 4x oversubscribed topology, 4 VMs/server Each tenant runs one job, all VMs of all tenants same

guarantee

Two scenarios: Light

10% of VM slots are either a mapper or a reducer Randomly placed

Heavy 100% of VM slots are either a mapper or a reducer Mappers are placed in one half of the datacenter

0.2 0.5 1 50

Evaluation – MapReduceCD

Worst case shuffle completion time / static reservation

0.2 0.5 1 50

FNo ProtectionElasticSwitch

Light Setup

Work-conserving pays off: finish faster than static reservation

Longest completion reduced from No Protection

0.2 0.5 1 50

Heavy Setup

ElasticSwitch enforces guarantees in worst case

No Protection

ElasticSwitch

Guarantees are useful in reducing worst-case shuffle completion

up to 160X

ElasticSwitch Summary Properties

1. Bandwidth Guarantees: hose model or derivatives2. Work-conserving3. Practical: oversubscribed topologies, commodity switches,

decentralized

Design: two layers Guarantee Partitioning: provides guarantees by transforming

hose-model guarantees into VM-to-VM guarantees Rate Allocation: enables work conservation by increasing

rate limits above guarantees when no congestion

HP Labs is hiring!

Backup Slides

Evaluation – OverheadNon-linear in number of limiters

Splitting control per hosted VM should keep operation point here

Can be significantly improved!

0 50 100 150 200 250 3000

Number of VM-to-VM flows

Guarantee Partitioning – Dynamic Behavior Optimize for bimodal distribution flows

Most flows short, a few flows carry most bytes Short flows care about latency, long flows care

about throughput

Start with a small guarantee for a new VM-to-VM flow

If demand not satisfied, increase guarantee exponentially

Short Flows

Short Flows/s0 20 40 60 80 100

100150200250300350400450500

1 new VM-to-VM pair/sec10 new VM-to-VM pairs/sec20 new VM-to-VM pairs/sec30 new VM-to-VM pairs/sec50 new VM-to-VM pairs/sec

Latency

# VM-to-VM flows in background

0 1 5 20 23(Tight Guar.)

00.20.4

0.60.8

1.21.41.6

No ProtectionOktopus-like Res.ElasticSwitch

Future Work

Reduce Overhead ElasticSwitch: average 1 core / 15 VMs ,worst case

1 core /VM

Multi-path solution Single-path reservations are inefficient No existing solution works on multi-path networks

Guarantee Partitioning – Efficiency: Reallocation

21 3 4

Intuition: propagates from smaller to larger guarantee shares

Guarantee Partitioning converges

Rate AllocationRXY = max(BXY , Rweighted-TCP-like)

Weight is the BXY guarantee

Order of magnitude longer timescale than RTT – not affects TCP

ElasticSwitch : Practical Work-Conserving Bandwidth Guarantees for Cloud Computing

spare bandwidth

metenforce bandwidth

vm control

expensive work

vm pipe guaranteesbxybxzvm

vm pipe guaranteesvm

guarantee partitioning1

cloud computing elasticswitch

Documents

SecondNet: A Data Center Network Virtualization ...€¦ ·...

Conserving Otters

BCommands...BCommands •backoff,onpage4...

Bandwidth Exchange: An Energy Conserving Incentive Mechanism...

Topology Design for Service Overlay Networks with Bandwidth....

Conserving Biodiversity

PT-symmetric quantum systems - Heidelberg...

Application-Driven Bandwidth Guarantees in...

Effective Capacity of Statistical Delay QoS Guarantees...

QUORUM—Quality of Service in Wireless Mesh...

INTEGRATING WATER INTO LAND USE PLANNING INTEGRATING WATER.....

Conserving Nature - Department of Environment, Water … ·...

UUsseerr MMaannuuaall - Foscam · sizes and conserving...

Chapter 10. Traffic Management - dlink.com.t · •...

Guarantees Mutual Guarantees Report 1084 VM 01.07

ElasticSwitch: Practical Work-Conserving Bandwidth...