Top Banner
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute [email protected] http://www.ecse.rpi.edu/Homepages/shivkuma Also based on slides of S. Keshav (Ensim), Douglas Comer (Purdue), Raj Yavatkar (Intel), Cyriel Minkenberg (IBM Zurich), Sonia Fahmy (Purdue) Minkenberg (IBM Zurich), Sonia Fahmy (Purdue) Many slides thanks to Nick McKeown (Stanford),
33

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute [email protected].

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

1

High Speed Router Design

Shivkumar KalyanaramanRensselaer Polytechnic Institute

[email protected] http://www.ecse.rpi.edu/Homepages/shivkuma

Also based on slides of S. Keshav (Ensim), Douglas Comer (Purdue),Raj Yavatkar (Intel), Cyriel Minkenberg (IBM Zurich), Sonia Fahmy (Purdue)Minkenberg (IBM Zurich), Sonia Fahmy (Purdue)

Many slides thanks to Nick McKeown (Stanford),

Page 2: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

2

Introduction Evolution of High-Speed Routers High Speed Router Components:

Lookup Algorithm Switching Classification, Scheduling

Multi-Tbps Routers: Challenges & Trends

Overview

Page 3: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

3

What do switches/routers look like?

Access routerse.g. ISDN, ADSL

Core routere.g. OC48c POS

Core ATM switch

Page 4: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

4

Dimensions, Power Consumption

Cisco GSR 12416 Juniper M160

6ft

19”

2ft

Capacity: 160Gb/sPower: 4.2kW

3ft

2.5ft

19”

Capacity: 80Gb/sPower: 2.6kW

Page 5: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

5

Where high performance packet switches are used

Enterprise WAN access& Enterprise Campus Switch

- Carrier Class Core Router- ATM Switch- Frame Relay Switch

The Internet Core

Edge Router

Page 6: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

6

Where are routers? Ans: Points of Presence (POPs)

A

B

C

POP1

POP3POP2

POP4 D

E

F

POP5

POP6 POP7POP8

Page 7: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

7

POP with smaller routersPOP with large routers

Interfaces: Price >$200k, Power > 400W Space, power, interface cost economics! About 50-60% of i/fs are used for interconnection within the POP. Industry trend is towards large, single router per POP.

Why the Need for Big/Fast/Large Routers?

Page 8: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

8

Job of router architect

For a given set of features:

3

. . 5

2

Maximize capacity,

Power,

Volume,

C

P kW

V

t

m

s

Page 9: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

9

Performance metrics1. Capacity

“maximize C, s.t. volume < 2m3 and power < 5kW”2. Throughput

Maximize usage of expensive long-haul links. Trivial with work-conserving output-queued routers

3. Controllable Delay Some users would like predictable delay. This is feasible with output-queueing plus weighted fair

queuing (WFQ).

WFQ( , ) ( , )

Page 10: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

10

Relative preformance increase

100%

1000%

10000%

100000%

1996 1998 2000 2002

DWDM Link speedx2/8 months

Router capacityx2.2/18 months

Moore’s lawx2/18 m

DRAM access rate x1.1/18 m

Internetx2/yr

Page 11: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

11

Alt: Memory BandwidthCommercial DRAM

Memory speed is not keeping up with Moore’s Law.

0.0001

0.001

0.01

0.1

1

10

100

1000

1980 1983 1986 1989 1992 1995 1998 2001

Acc

ess

Tim

e (n

s) DRAM1.1x / 18months

Moore’s Law2x / 18 months

Router Capacity2.2x / 18months

Line Capacity2x / 7 months

Page 12: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

12

An Example: Packet buffers40Gb/s router linecard

BufferMemory

Write Rate, R

One 40B packetevery 8ns

Read Rate, R

One 40B packetevery 8ns

10Gbits

Buffer Manager

Use SRAM?+ Fast enough random access time, but- Too low density to store 10Gbits of data.

Use DRAM? + High density means we can store data, but- Can’t meet random access time.

Page 13: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

13

Eg: Problems w/ Output Queuing

Output queued switches are impractical

R

R

RR

DRAMDRAM

NR NR

data

R

R

RR

output1

N

Can’t I just use N separate memory devices per output?

Page 14: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

14

Packet processing is getting harder

1

10

100

1000

1996 1997 1998 1999 2000 2001

CPU Instructions per minimum length packet since 1996

Page 15: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

15

Basic Ideas: Part I

Page 16: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

16

First-Generation IP Routers

Most Ethernet switches and cheap packet routers Bottleneck can be CPU, host-adaptor or I/O bus What is costly? Bus ? Memory? Interface? CPU?

Shared Backplane

Line Interface

CPU

Memory

CPU BufferMemory

LineInterface

DMA

MAC

LineInterface

DMA

MAC

LineInterface

DMA

MAC

Page 17: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

17

First Generation Routers

Shared Backplane

Line Interface

CPU

Memory

RouteTableCPU Buffer

Memory

LineInterface

MAC

LineInterface

MAC

LineInterface

MAC

Fixed length “DMA” blocksor cells. Reassembled on egress

linecard

Fixed length cells or variable length packets

Typically <0.5Gb/s aggregate capacity

Page 18: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

18

Output 2

Output N

First Generation RoutersQueueing Structure: Shared Memory

Large, single dynamically allocated memory buffer:N writes per “cell” timeN reads per “cell” time.

Limited by memory bandwidth.

Input 1 Output 1

Input N

Input 2

Numerous work has proven and made possible:

Fairness Delay Guarantees Delay Variation Control Loss Guarantees Statistical Guarantees

Page 19: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

19

Second-Generation IP Routers

CPU BufferMemory

LineCard

DMA

MAC

LocalLocalBufferBufferMemoryMemory

LineCard

DMA

MAC

LocalLocalBufferBufferMemoryMemory

LineCard

DMA

MAC

LocalLocalBufferBufferMemoryMemory

Port mapping intelligence in line cards Higher hit rate in local lookup cache What is costly? Bus ? Memory? Interface? CPU?

Page 20: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

20

Second Generation Routers

RouteTableCPU

LineCard

BufferMemory

LineCard

MAC

BufferMemory

LineCard

MAC

BufferMemory

FwdingCache

FwdingCache

FwdingCache

MAC

Slow Path

Drop PolicyDrop Policy Or Backpressure

OutputLink

Scheduling

BufferMemory

Typically <5Gb/s aggregate capacity

Page 21: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

21

RouteTableCPU

Second Generation RoutersAs caching became ineffective

LineCard

BufferMemory

LineCard

MAC

BufferMemory

LineCard

MAC

BufferMemory

FwdingTable

FwdingTable

FwdingTable

MAC

ExceptionProcessor

Page 22: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

22

Second Generation RoutersQueuing Structure: Combined Input and Output Queuing (CIOQ)

Bus

1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by bus speed

Page 23: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

23

Third-Generation Switches/Routers

LineCard

MAC

LocalBuffer

Memory

CPUCard

LineCard

MAC

LocalBuffer

Memory

Switched Backplane

Line Interface

Line Interface

Line Interface

Line Interface

Line Interface

Line Interface

Line Interface

Line InterfaceCPU

Memory

Third generation switch provides parallel paths (fabric)

What’s costly? Bus? Memory, CPU?

Page 24: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

24

Third Generation Routers

LineCard

MAC

LocalBuffer

Memory

CPUCard

LineCard

MAC

LocalBuffer

Memory

Switched Backplane

Line Interface

CPUMem

ory FwdingTable

RoutingTable

FwdingTable

Typically <50Gb/s aggregate capacity

Page 25: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

25

Arbiter

Third Generation RoutersQueueing Structure

Switch

1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch

fabric speedup

Page 26: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

26

Arbiter

Third Generation RoutersQueueing Structure: VOQs

Switch

1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch

fabric speedup

Per-flow/class or per-output queues (VOQs)

Per-flow/class or per-input queues

Flow-controlbackpressure

Page 27: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

27

Third Generation Routers: limits

19” or 23”

7’

• Size-constrained: 19” or 23” wide.

• Power-constrained: ~<8kW.

Supply: 100A/200A maximum at 48V

Page 28: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

28

Fourth Generation: Clustering/Multi-stage

Switch Core Linecards

Optical links

100’sof feet

Page 29: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

29

Key: Physically Separating Switch Core and Linecards

Distributes power over multiple racks. Allows all buffering to be placed on the linecard:

Reduces power.Places complex scheduling, buffer mgmt, drop

policy etc. on linecard.

Page 30: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

30

Fourth Generation Routers/Switches

Switch Core Linecards

Optical links

100’sof feet

The LCS Protocol

Page 31: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

31

Linecard

LCS

LCS

1: Req

Physical Separation

3: DataSwitch

Scheduler

Switch

Scheduler

2: Grant/credit

Seq num

Switch

Fabric

Switch

Fabric

Switch Port

Req

Grant

1 RTT

Per Queue Counters

Page 32: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

32

Physical SeparationAligning Cells

Switch

Scheduler

Switch

Scheduler

Switch

Fabric

Switch

Fabric

LCS

LCS

LCS

Switch Core

Linecard

Linecard

Linecard

Page 33: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

33

Fourth Generation Routers/SwitchesQueueing Structure

1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch

fabric speedup

Lookup&

DropPolicy

OutputScheduling

Virtual Output Queues

OutputScheduling

OutputScheduling

SwitchFabric

SwitchArbitration

Linecard Linecard

Switch Core(Bufferless)

Lookup&

DropPolicy

Lookup&

DropPolicy

Typically <5Tb/s aggregate capacity