Top Banner
DATA CENTER FABRIC COOKBOOK Do It Yourself ! DATA CENTER FABRIC COOKBOOK How to prepare something new from well known ingredients Emil Gągala
26

PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

Sep 12, 2018

Download

Documents

vuongdan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

DATA CENTER FABRIC COOKBOOK

Do It Yourself !

DATA CENTER FABRIC COOKBOOKHow to prepare something new from well known ingredients

Emil Gągała

Page 2: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

WHAT DOES AN

IDEAL FABRICLOOK LIKE?

2 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

LOOK LIKE?

Page 3: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

REQUIREMENTS - THE NETWORK FABRIC

Scalability and resilience of a networkA Network Fabric has the….

1. Any-to-any flat connectivity with fairness and full non-blocking

2. Low latency and jitter

3. No packet drops under congestion

3 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Performance and simplicity ofa single switch

4. Linear cost and power scaling with the number of interfaces

5. Support of virtual Layer 2 and Layer 3 networks and services

6. Modular distributed implementation that is highly reliable and scalable

7. Single logical device

Page 4: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

FABRIC OF SWITCHES OR …DISTRIBUTED SWITCH FABRIC

Aggregation

Layer

Core Layer

4 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Access Layer

Page 5: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

FABRIC OF SWITCHES OR …DISTRIBUTED SWITCH FABRIC

Switch

Fabric

One NetworkFlat, any-to-any

connectivity

5 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Page 6: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

…BUT WHAT ABOUT SCALING?

Latency

Scale vs. Latency

Traditionaldesign

Fabric

Ethernet

Multi Tier Network

6 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Scale

Scale

Bandwidth

Scale vs. Bandwidth

Traditionaldesign

Fabric

Servers NAS

Page 7: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

INGREDIENTS - THE NETWORK FABRIC

1. Control Plane – Routing Engine

2. Switching Plane - Fabric

3. Forwarding Plane - I/O Modules

7 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Single deviceN=1

Single switch does not scaleSingle point of failure

Page 8: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

10 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

FABRIC CONTROL PLANE

Page 9: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

CONTROL PLANE

� Can we have a single Routing Engine to control all the TORswitches as line cards?

� Answer is NO:

� Key Reason: Scalability issues. Eg.

– On a typical router, RE distributes the complete forwarding state to all PFEs

- This cannot scale to Fabric levels, with a few thousand TORs

– Protocol processing cannot scale with such a large number of interfaces

- LACP:O(3000) sessions

11 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

- LACP:O(3000) sessions

- ARP:O(100K – 500K endpoints)

� Solution:

� Use Multiple REs

� Distributed and virtualized

control plane

Page 10: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

3 BASIC FUNCTIONS ANY SWITCH HAS TO SOLVE

1. System Discovery

� Who are we?

2. Fabric Discovery

� How many ways can we send data to each other?

13 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

3. Control State Propagation

� How do we exchange control state between ourselves?

Page 11: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

THE 3 INTERNAL PROTOCOLS EXAMPLES

1. System Discovery

� IS-IS Based

� Runs on Control Plane Ethernet Network

2. Fabric Topology Discovery

� IS-IS Based

3. Control State Propagation

14 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

3. Control State Propagation

� Fabric Control Protocol

� BGP Based

Page 12: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

LOGICAL SYSTEM VIEW AFTER SYSTEM DISCOVERY

One Big LAN

1. System Discovery

� IS-IS Based

� Runs on Control Plane Ethernet Network

15 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Node 1 Node 2 Node 128

Interconnect 1 Interconnect 4

Fabric Manager

Fabric Control 1

Fabric Control 2

NNG RE (Active)

NNG RE (Backup)

Page 13: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

WHY TO CENTRALIZE FABRIC DATA PLANE FORWARDING LOGIC?

� “Centralize what you can, distribute what you must”

� Frees us from the tyranny of equal-cost shortest path

Link 1: 4 Gbps

Link 2: 2 GbpsA B

2. Fabric Topology Discovery

� IS-IS Based

16 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

equal-cost shortest path routing

� Can use all possible paths between any two QFNodes within the fabric

� Traffic on each path proportional to the cost (or inverse) of each path

� Faster convergence times

� Less computation-intensive

Link 3: 4 Gbps

With Spanning Tree

� Only one link is used

� Complex configuration to make sure that the 4Gbps link is used

� Effective fabric bandwidth : 4 Gbps

With TRILL (FabricPath)

� Both 4Gbps links are used

� Distributed equal cost constraint rules out Link 2

� Effective fabric bandwidth : 8 Gbps

With Fabric

� All links are used

� Spray weights intelligently send appropriate traffic on each link

� Effective fabric bandwidth : 10 Gbps

Page 14: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

FABRIC CONTROL PROTOCOL – WHY BGP?� Let’s consider the desirable attributes of any such protocol

� Should have in-built scaling model

� Should have multi-version support – needed for Partitions

� Should be extensible – needs to carry both Mac and IP routes

� Should have overlapping address space support

� Should be hardened – it is the heart of the system after all

� BGP fits the fill perfectly

17 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

� BGP fits the fill perfectly

� Route Reflector mechanism

� Standard open protocol

� TLV mechanism

� Route Distinguisher, Route Target constructs

� Field Proven

Page 15: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

ACTUAL L3-VPN APPROACH

18 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Page 16: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

LEVERAGING L3-VPN APPROACH IN DCF

19 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Page 17: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

20 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

FABRIC DATA PLANE

Page 18: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

DIFFERENT DC FABRIC ARCHITECTURES

#1 Architecture• L2 at access• L3, ACLs, buffering in core

Cost

Scale

Rich Edge

#2 Architecture• L2, FC, ACLs in access• L3, FC, ACLs, buffering in core

Cost

Scale

FeatureRich Core

FeatureRich Core

Minimal Edge

21 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Scale Scale

Rich Edge

#3 Architecture• L2, L3, FC, ACLs in access• No features in core

Cost

Scale

MinimalCore

Page 19: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

KNOWING FABRIC – IN BRIEF

� LARGE scale distributed system which acts as a single logical L2/ L3 switch

� Physically consists of multiple chassis

� Composed of

� An intelligent edge which makes complex forwarding decisions - TOR/ LCC

� A high speed but dumb core which transfers packets across the

22 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

� A high speed but dumb core which transfers packets across the

intelligent edge - Interconnect chassis

Page 20: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

CLOS TOPOLOGY

Topology developed in 1950s for telephone switching equipment

I/p 0

I/p 3

I/p 4

O/p 0

O/p 3

O/p 4

First Stage Second Stage Third Stage

F1 F2 F3

23 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

I/p 4

I/p 7

I/p 8

I/p 11

I/p 12

I/p 15

O/p 4

O/p 7

O/p 8

O/p 11

O/p 12

O/p 15

F1

F1

F1

F2 F3

F2 F3

F2 F3

Page 21: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

CLOS SWITCH OPERATION

� First stage: spray cells evenly

� Second: Provides All-to-All Connectivity

� Third stages: Provides non-blocking property

1st StageF1

2nd StageF2

3rd StageF2

24 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Page 22: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

CLOS PROPERTIES

Multiple redundant paths per source destination pair.

Topology is rearrangably nonblocking

� In a circuit-switched Clos network, a new connection can always be made if existing connections can be moved to different paths

� Spraying cells achieves the same effect as moving circuits

Each cell may have different transition time through fabric

28 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Each cell may have different transition time through fabric

Reorder buffer in Egress PFE to put cells back into sequence

before forwarding packet

Multiple links between each stage of the fabric.

Graceful bandwidth degradation when a component fails

Page 23: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

FABRIC TOPOLOGY EXAMPLE

128 x TOR

DCF#0

TOR #01

128

29 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

DCF#1

TOR #127

128

1

128

Page 24: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

33 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

YOUR DINNER

Page 25: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put

3 years in development

1 million man hours

34 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

1 million man hours

$100s of millions invested

Over 125 patents pending

Page 26: PLNOG Juniper DCF - data.proidea.org.pl · DATA CENTER FABRIC COOKBOOK ... LARGE scale distributed system which acts as a single logical L2/ ... Reorder buffer in Egress PFE to put