Top Banner
Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta Microsoft Research Changhoon Kim
21

Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

Virtual Layer 2:A Scalable and FlexibleData-Center Network

Work with Albert Greenberg,James R. Hamilton, Navendu Jain,

Srikanth Kandula, Parantap Lahiri,David A. Maltz, Parveen Patel,

and Sudipta Sengupta

Microsoft ResearchChanghoon Kim

Page 2: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

2

Tenets of Cloud-Service Data Center

• Agility: Assign any servers to any services– Boosts cloud utilization

• Scaling out: Use large pools of commodities– Achieves reliability, performance, low cost

Page 3: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

3

What is VL2?

• Why is agility important?– Today’s DC network inhibits the deployment of

other technical advances toward agility

• With VL2, cloud DCs can enjoy agility in full

The first DC network that enables agility in a scaled-out fashion

Page 4: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

Status Quo: Conventional DC Network

Reference – “Data Center: Load balancing Data Center Services”, Cisco 2004

CR CR

AR AR AR AR. . .

SS

DC-Layer 3

Internet

SS

A AA …

SS

A AA …

. . .

DC-Layer 2Key

• CR = Core Router (L3)• AR = Access Router (L3)• S = Ethernet Switch (L2)• A = Rack of app. servers

~ 1,000 servers/pod == IP subnet

4

Page 5: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

Conventional DC Network ProblemsCR CR

AR AR AR AR

SS

SS

A AA …

SS

A AA …

. . .

5

SS

SS

A AA …

SS

A AA …

~ 5:1

~ 40:1

~ 200:1

• Dependence on high-cost proprietary routers• Extremely limited server-to-server capacity

Page 6: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

And More Problems …

6

CR CR

AR AR AR AR

SS

SS SS

SS

SS SS

IP subnet (VLAN) #1

~ 200:1

• Resource fragmentation, significantly lowering cloud utilization (and cost-efficiency)

IP subnet (VLAN) #2

A AA … A AA … A A… AA …AA A

Page 7: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

And More Problems …

7

CR CR

AR AR AR AR

SS

SS SS

SS

SS SS

IP subnet (VLAN) #1

~ 200:1

• Resource fragmentation, significantly lowering cloud utilization (and cost-efficiency)

Complicated manual L2/L3 re-configuration

IP subnet (VLAN) #2

A AA … A AA … A A… AA …AA A

Page 8: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

And More Problems …

8

CR CR

AR AR AR AR

SS

SS SS

SS

SS SS

• Resource fragmentation, significantly lowering cloud utilization (and cost-efficiency)

A AA … A AA … A A… AA …AA

Revenue lost Expense wasted

Page 9: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

9

Know Your Cloud DC: Challenges• Instrumented a large cluster used for data mining

and identified distinctive traffic patterns

• Traffic patterns are highly volatile– A large number of distinctive patterns even in a day

• Traffic patterns are unpredictable– Correlation between patterns very weak

Optimization should be done frequently and rapidly

Page 10: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

10

Know Your Cloud DC: Opportunities• DC controller knows everything about hosts

• Host OS’s are easily customizable

• Probabilistic flow distribution would work well enough, because …– Flows are numerous and not huge – no elephants!– Commodity switch-to-switch links are substantially

thicker (~ 10x) than the maximum thickness of a flow

DC network can be made simple

Page 11: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

11

All We Need is Just a Huge L2 Switch,or an Abstraction of One

A AA … A AA …

. . .

A AA … A AA …

CR CR

AR AR AR AR

SS

SS SS

SS

SS SS

AAAA AAAA AAAA A A A A AA A AA AA AA

. . .

Page 12: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

12

All We Need is Just a Huge L2 Switch,or an Abstraction of One

The Illusion of a Huge L2 Switch

1. L2 semantics

2. Uniform high capacity

3. Performance isolation

A AA … A AA … A AA … A AA …AAAA AAAA AAAA A A A A AA A AA AA AA

Page 13: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

13

Specific Objectives and Solutions

SolutionApproachObjective

2. Uniformhigh capacity between servers

Enforce hose model using existing

mechanisms only

Employ flat addressing

1. Layer-2 semantics

3. Performance Isolation

Guarantee bandwidth for

hose-model traffic

Flow-based random traffic indirection

(Valiant LB)

Name-location separation &

resolution service

TCP

Page 14: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

14

VL2

Addressing and Routing:Name-Location Separation

payloadToR3

. . . . . .

yx

Servers use flat names

Switches run link-state routing and maintain only switch-level topology

Cope with host churns with very little overhead

y zpayloadToR4 z

ToR2 ToR4ToR1 ToR3

y, zpayloadToR3 z

. . .

DirectoryService

…x ToR2

y ToR3

z ToR4

Lookup &Response

…x ToR2

y ToR3

z ToR3

Page 15: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

15

VL2

Addressing and Routing:Name-Location Separation

payloadToR3

. . . . . .

yx

Servers use flat names

Switches run link-state routing and maintain only switch-level topology

Cope with host churns with very little overhead

y zpayloadToR4 z

ToR2 ToR4ToR1 ToR3

y, zpayloadToR3 z

. . .

DirectoryService

…x ToR2

y ToR3

z ToR4

Lookup &Response

…x ToR2

y ToR3

z ToR3

• Allows to use low-cost switches• Protects network and hosts from host-state churn• Obviates host and switch reconfiguration

Page 16: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

VL2

Example Topology: Clos Network

16

. . .

. . .

TOR

20 Servers

Int

. . . . . . . . .

Aggr

K aggr switches with D ports

20*(DK/4) Servers. . . . . . . . . . .

Offer huge aggr capacity and multi paths at modest cost

Page 17: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

VL2

Example Topology: Clos Network

17

. . .

. . .

TOR

20 Servers

Int

. . . . . . . . .

Aggr

K aggr switches with D ports

20*(DK/4) Servers. . . . . . . . . . .

Offer huge aggr capacity and multi paths at modest cost

D (# of 10G ports)

Max DC size(# of Servers)

48 11,52096 46,080

144 103,680

Page 18: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

18

Traffic Forwarding: Random Indirection

x y

payloadT3 y

z

payloadT5 z

IANYIANYIANY

IANY

Cope with arbitrary TMs with very little overhead

Links used for up paths

Links usedfor down paths

T1 T2 T3 T4 T5 T6

Page 19: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

19

Traffic Forwarding: Random Indirection

x y

payloadT3 y

z

payloadT5 z

IANYIANYIANY

IANY

Cope with arbitrary TMs with very little overhead

Links used for up paths

Links usedfor down paths

T1 T2 T3 T4 T5 T6

[ ECMP + IP Anycast ]• Harness huge bisection bandwidth• Obviate esoteric traffic engineering or optimization• Ensure robustness to failures• Work with switch mechanisms available today

Page 20: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

20

Does VL2 Ensure Uniform High Capacity?• How “high” and “uniform” can it get?– Performed all-to-all data shuffle tests, then measured

aggregate and per-flow goodput

• The cost for flow-based random spreading

Time (s)

Fair

nes

s In

dex

§

0 100 200 300 400 500

1.000.960.920.880.840.80

Fairness of Aggr-to-Int links’ utilization

Goodput efficiency

Fairness§ between flows § Jain’s fairness index defined as (∑xi)2/(n∙∑xi

2)

94%

0.995

Page 21: Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

21

VL2 Conclusion• VL2 achieves agility at scale via

1. L2 semantics2. Uniform high capacity between servers3. Performance isolation between services

Lessons• Randomization can tame volatility• Add functionality where you have control• There’s no need to wait!