Top Banner
All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas Networks)
32

All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Mar 27, 2015

Download

Documents

Elizabeth Kerr
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

All-Path Bridging Update

IEEE Plenary meeting Atlanta 7-10 Nov.

Jun Tanaka (Fujitsu Labs. Ld.)

Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas

Networks)

Page 2: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

23/04/10 2

Contents

• All-Path Basics

• Issues

• Report of All-Path Demos

• Report of proposal to AVB WG

Page 3: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Problem Statement

IEEE802.1D RSTP has following limitations;– Not all the links cannot be used– The shortest path might not be used anytime– No multipath available– Root bridge tends to be high load– Not scalable

23/04/10 3

Page 4: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Objectives

• To overcome RSTP limitation– Loop free– All links to be used– Provide shortest path– Provide multipath– Compatible with 802.1D/Q– No new tag or new frame to be defined– Zero configuration

23/04/10 4

TRILL SPB

Page 5: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

3D

D

Port locked to S

Port locked to D

S

S

5

All-Path Basics (One-way)

2

1

4

5

ARP_req

Page 6: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

3D

D

Port locked to S

Port locked to D

S

S

6

All-Path Basics (One-way)

2

1

4

5

X

X

S

S

ARP_req

ARP_req

The first received portis locked to S. -Register S to a table-Start lock timer-Learn S at the port

The later received portdiscard the frame S. -Check S w/ the table if the lock timer effective

Page 7: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

3D

D

Port locked to S

Port locked to D

S

S

7

All-Path Basics (One-way)

2

1

4

5S

SS

S

X

X

X

ARP_req

Page 8: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

3D

D

Port locked to S

Port locked to D

S

S

8

All-Path Basics (Two-way)

2

1

4

5S

SS

S

ARP_reply

DD

If DA is on the FDBUnicast forwarding same as 802.1d

Page 9: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

3D

D

Port locked to S

Port locked to D

S

S

9

All-Path Basics (Two-way)

2

1

4

5S

SS

S

ARP_reply

DDD

Page 10: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Needful Things

• Forwarding Database (Large: ex.16k~ entry)

• Aging timer (long: ex. 300s)

• First-come table (small: ex. ~1k entry)

• Lock timer (short: ex. ~1s)

• Filtering logic (late-come frames)

23/04/10 10

+

802.1D

All-Path

Page 11: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Minimum aging time of Lock timer FP

The minimum aging time

FP FP FP

SP

SP xx

Second port received (discarding)

Processing time

(forwarding, learning, classification, tagging, queuing etc.)

x

First port received (learning)

Processing time

(forwarding, learning, classification, tagging, queuing etc.)

x

FP: First Port, SP: Second Port

The aging timer shall be valid to discard this frame as received

from the second port

The First-come table aging time shall be longer than 2 x (one-way link delay + processing delay)If it is for Data center, it can be less than 1ms.

Second port received (discarding)

First port received (learning)

1123/04/10

Page 12: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

23/04/10 12

Scope of All-Path

Scalability

Manageability

Enterprise, Campus, Small datacenter

Home network etc.

ALL-PATH

Simple Less operation Natural load balance

Large area, provider networkLarge datacenter etc.

Both support, loop free, shortest path

LAN

SPB, ECMPTRILL

MAN/WAN

RSTP/MSTP

Page 13: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

23/04/10 13

Issues

1. Path recovery

2. Server edge

3. Load balance

Page 14: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

2 3 5

1

4

DS

SS

D

14

S S

1. Path Recovery – Original idea

ARP_req

• Mechanism: When unknown unicast frame arriving at bridge with failed link, path fail message is generated per MAC entry towards source bridge, that generates corresponding ARP to re-establish tree.

• Question: If 10K MAC entries are existed in FDB, 10K path fail frames should be generated, is it feasible processing for local CPU, especially in high-speed link (ex. 10GE)?

Page 15: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

2 3 5

1

4

D

D

Port locked to S

Port locked to D

S

S

SS

D

D

15

1. Path Recovery – Original idea

Path_fail

Path_fail

Page 16: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

23/04/10 16

1.Path recovery – Selective flush

MAC=aMAC=b

bb bba a a a

a

a

flush “b”flush “b”

flush “b

SW1 SW2

SW3

SW4

SW5

SW6

1

2

3

2 2 2 2

2

1 1

1

1

1

3

flush message is terminated because “b” is not binded to port1

May includes two or more…ex. 100s of MAC addresses to be flushed as a list.

Delete entry “b” from FDB and re-sends the flush message to SW1.

When link failure is detected, MAC flush lists are flooded.

54 frames (187 MAC / 1500B frame) for 10K MAC entry.

Avoid unnecessary flooding, MAC entries are deleted to shorten.

Issues: How to prevent flush frame loss.

May require CPU processing power.

Experience: 15ms to flush 10K MACs in a node (1GHz MIPS Core)

(Fujitsu)

Page 17: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

1. Path Recovery - Loop back(UAH)

• Low processing at failed (link) bridges: loopback is part of the standard forwarding table

• Processing load is distributed among source edge bridges involved in flows. Only one side (SA>DA) asks for repair.

• Resiliency: If first packet looped back is lost the other following looped back frames will follow.

Page 18: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

23/04/10 18

2. Server Edge

Vswitch

NIC NIC

• Question: If a server has two or more NICs, how to find which port is first?

• vswitch: only vswitch to support All-Path• VEB: both VEB and vswitch to support All-Path• VEPA: only external switch to support All-Path

Vswitch

NIC

VEB

NIC

VEB

VEPA

NIC

Ext. switch

Page 19: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

3. Load Balance (Fujitsu)

23/04/10 19

SW1

SW2 SW3

Elapsed timeT

hrou

ghpu

t

• Load balance is available in natural way because high load link tend not to be selected with queuing delay.

• Pros: zero-configuration load balance• Cons: you cannot control load balance like SPB/ECMP

SW1

SW2

SW3

SW4

SW5

Page 20: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Load Distribution (UAH simulations)

• Objectives:– Explain native load distribution results of

Singapore presentation – Visualize how the on-demand path selection

avoids the loaded links• Topology:

– Links subset of a Small Data Center topology to show path selection at core

– Core links capacity is lower (100Mbps) to force load distribubtion and congestion only at core

– Queues support up to 100.000 frames (so that they affect as delay and not discarding frames)

• Traffic: stepped in sequence, left to right– Green servers send UDP packets towards red

servers– Groups of 25 servers initiate the communication

every second. The first one at second 1, the second at second 2, second 3,…. And finally, the last group is a single server that starts the communication at second 4 of the simulation.

– UDP packets (1 packet every 1 ms, simultaneous for all servers). The packet size varies between 90 and 900 bytes in the different simulations to simulate increasing traffic loads.

Page 21: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Simulation I – UDP packet size: 90Bx 25 x 25

x 25

x 25

x 25

x 25

x 25

x 25

1s

2s

3s

4s

S4

S2S1

S3

s3-s4s3-s4 and s3-s2-s4s3-s4 and s3-s2-s4s3-s1-s4

Note the path s3-s4 is reused several times because is still not so loaded (low traffic)

# flows51

11 24

24

0

Server Group Paths

1234

Page 22: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Simulation I – UDP packet size: 300B

S4

S2S1

S3

s3-s4s3-s1-s4 and s3-s2-s4s3-s2-s4s3-s4

Note the path s3-s4 is not reused when the 2nd group starts, but instead uses s3-s1-s4 and s3-s2-s4, similar with the 3rd group, the 4rd reuses s3-s4 because it’s again the fastest once s1 and s2 are loaded because of groups 2 and 3

# flows26

1414 36

36

0

Server Group Paths

1234

Page 23: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Simulation I – UDP packet size: 900Bx 25 x 25

x 25

x 25

x 25

x 25

x 25

x 25

1s

2s

3s

4s

S4

S2S1

S3

s3-s4s3-s1-s4s3-s2-s4s3-s4

900B means some frames are being discarded at queues (too much traffic). Group 1 chooses s3-s4 and fully loads it, 2 chooses s3-s1-s4 and same happens, 3 chooses s3-s2-s4 and same, when 4 starts, every link (except the one from s1-s2) is fully loaded, so s3-s4 is again the fastest path and is chosen.

# flows26

2525 25

25

0

Server Group Paths

1234

Page 24: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Load distribution conclusions

• Notice how the # of flows gets distributed in the links in the core when the traffic increases due to increased latency.– Load distribution starts with low loads– Path diversity increases with load

• Similar balancing effect observed in redundant links from an access switch to two core switches

• On demand path selection finds paths adapted to current, instantaneous conditions, not to past or assumed traffic matrix

Page 25: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

23/04/10 25

Report on Proposal for AVB TG

• May 12, Thu, morning session @ AVB• Dr. Ibanez presented the materials as used in IW session

(Singapore and Santa Fe)• Questions and comments

– Any other metric than latency e.g. bandwidth?

– Path recovery time comparing with RSTP?

– Any broadcast storm occurred when link failed?

– What’s the status in IW session, any PAR created?

• AVB status– They try to solve by their own way, using SRP.

– Not only latency but also bandwidth can be used as metric

– Also redundant path can be calculated

Page 26: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Path Selection with SRP

23/04/10 26

at-phkl-SRP-Stream-Path-Selection-0311-v01.pdf

Page 27: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

REPORT OF ALL PATH DEMOS - TORONTO: SIGCOM AUGUST 2011 - BONN: LCN OCTOBER 2011

Page 28: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Demo at Sigcom 2011•HW NetFPGA implementation •Four NetFPGAs (4*1 Gbps) •Demo:

• Zero configuration• Video streaming, high throughput.• Robustness, no frame loops• Fast path recovery• Internet connection, std hosts

•http://conferences.sigcomm.org/sigcomm/2011/papers/sigcomm/p444.pdf

Page 29: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Demo at IEEE LCN 2011 (october, Bonn)Openflow and Linux (OpenWRT) ALL Path switches

NOX Openflow controller

Ethernet switch

Page 30: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

• One NEC switch splitted into 4 Openflow switches

• Four Soekris boards as 4 Openflow switches• Two Linksys WRT routers running ARP Path

over Linux implementation• Video streaming and internet access without

host changes– Some video limitations at OpenWRT routers– Smooth operation on Soekris and NEC.

• Reference: A Small Data Center Network of ARP-Path Bridges Made of Openflow Switches. Guillermo Ibáñez (UAH); Jad Naous (MIT/Stanford Univ.) ; Elisa Rojas (UAH); Bart De Schuymer (Art in Algorithms, Belgium); Thomas Dietz (NEC Europe Ltd., Germany)

Demo at IEEE LCN 2011 (october, Bonn)Openflow and Linux (OpenWRT) ALL Path switches

Page 31: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Feedback from All Path-UAH demos• At every demo most people an explanation of how ARP Path

works (video available was shown) • Intrigued about the mechanism, and interest on the

reconfiguration of flows and native loop avoidance• Amount of state stored per bridge: per host or per bridge.

(Encapsulating versions Q-in-Q, M-in-M are possible, but not the target, already covered by SPB)

• Questions on compatibility and miscibility with standard bridges (automatic core-island mode, no full miscibility)

• Collateral questions on NetFPGA and on LCN demo topology • Next step : Implementation on a commodity Ethernet

Switch (FPGA) (Chip/Switch manufacturers are invited to provide a switch platform) and implementation of interoperability with 802.1D bridges in Linux version

Page 32: All-Path Bridging Update IEEE Plenary meeting Atlanta 7-10 Nov. Jun Tanaka (Fujitsu Labs. Ld.) Guillermo Ibanez (UAH, Madrid, Spain) Vinod Kumar (Tejas.

Conclusions• All Path bridging is a reality

– A new class of transparent low latency bridges • Do not compute, find the path by direct probing

• Zero configuration• Robust, loop free• Native load distribution• Paths non predictable, but resilient, paths adapt to

traffic and traffic is not predictable• Low latency