Top Banner
19
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 3: SDN in Warehouse Scale Datacenters v2.0

SDN in Warehouse Scale Datacenters

v2.0

Igor Gashinsky

[email protected]

Principal Architect

Yahoo!

April 17th, 2012

Page 4: SDN in Warehouse Scale Datacenters v2.0

Some Terminology

Page 5: SDN in Warehouse Scale Datacenters v2.0

- 3 -

Control Plane

Forwarding Plane

Other components

CPU

Management Plane

Networking

today

Servers

today

HW management

Plane Generic PC

OS (linux, freebsd,

windows, solaris)

Applications

(apache, IIS, etc)

X86 instruction set

Openflow

SDN

What is SDN and OpenFlow

Page 6: SDN in Warehouse Scale Datacenters v2.0

- 4 -

What stayed the same?

Page 7: SDN in Warehouse Scale Datacenters v2.0

- 5 -

Datacenter Virtualization

Page 8: SDN in Warehouse Scale Datacenters v2.0

- 6 -

Why SDN for Virtualization?

• Requirements: • 20k servers per cluster = 400k VM's • Full any to any communication • Place any VM anywhere, anytime • VM migration (sub-second) • Guaranteed Consistency Model

• Problem: • How do you keep 20k devices in sync w/ 400k+ entities each?

• Solutions: • Current hardware can't keep up with FIB requirements • Current routing protocols don't do this well

l Lack of a consistency model • Flood and (s)pray doesn't work very well!

• Program the vSwitch from a central, distributed database!

Page 9: SDN in Warehouse Scale Datacenters v2.0

How has the market & vision

evolved?

Page 10: SDN in Warehouse Scale Datacenters v2.0

- 8 -

Predictions from 6 months ago:

Control Plane

Forwarding Plane

Management Plane

Forwarding Plane

Management Plane

Control Plane

Then SDN

Page 11: SDN in Warehouse Scale Datacenters v2.0

- 9 -

What we see now

Control Plane

Forwarding Plane

Management Plane

Forwarding Plane

Management Plane

Control

Plane

Then SDN

Unix +

APIs

Page 12: SDN in Warehouse Scale Datacenters v2.0

Why is that important?

Page 13: SDN in Warehouse Scale Datacenters v2.0

- 11 -

Configuration & Deployment Automation

Manual Magic

Network

Today Network

Tomorrow

Servers

Today

Page 14: SDN in Warehouse Scale Datacenters v2.0

Self-Healing Fabrics

(and a pony!)

Page 15: SDN in Warehouse Scale Datacenters v2.0

- 13 -

Blast from the past (Y! Presentation to HSSG in 2007)

L3 Switch L3 Switch

L3 Switch <10GE> L3 Switch

Host <GE> Switch

L3

Switch

N x

H

L3

Switch

Switch Switch Switch

H H H H H H H H

L3 Switch L3 Switch L3 Switch L3 Switch L3 Switch L3 Switch

L3

Switch

N x

H

L3

Switch

Switch Switch Switch

H H H H H H H H

L3

Switch

N x

H

L3

Switch

Switch Switch Switch

H H H H H H H H

L3

Switch

N x

H

L3

Switch

Switch Switch Switch

H H H H H H H H

L3

Switch

N x

L3

Switch

Switch Switch Switch

H H H H H H H H

L3

Switch

N x

H

L3

Switch

Switch Switch Switch

H H H H H H H H

L3

Switch

N x

H

L3

Switch

Switch Switch Switch

H H H H H H H H

L3

Switch

N x

H

L3

Switch

Switch Switch Switch

H H H H H H H H

L3 Switch <GE> Switch

** 8 way ECMP w/ 2x10GE LAGs**

** Way too many paths **

** Way too many cables **

Page 16: SDN in Warehouse Scale Datacenters v2.0

- 14 -

Today's Topologies

l CLOS-like l 20k server cluster ~= 16k internal links

l This means upto 1024 distinct links between a pair of hosts

l How do you troubleshoot this (for packetloss, etc)? l # of links to test = 1024/2 = 512 l 30 seconds/test l 256 man-minutes for most-basic troubleshooting!

l Is that acceptable?

l Really? :)

Page 17: SDN in Warehouse Scale Datacenters v2.0

- 15 -

Enter SDN

l Local Testing Agent l Looks at interface counters (errors, etc) l Performs interface/RIB/FIB health checking

l Local Repair Agent l Perform local repairs (ie FIB consistency check) l If certain conditions are met, automatically remove failed link(s) l If unsure of a safe action, ask the controller

l Global Controller l Has full visibility of the entire network l Can initiate repair/fixup actions of it's own

Page 18: SDN in Warehouse Scale Datacenters v2.0

- 16 -

Where's my pony?