Top Banner
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential CRS-1 Technion 1 CRS-1 overview TAU – Mar 07 Rami Zemach
44
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 1

CRS-1 overview

TAU – Mar 07

Rami Zemach

Page 2: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 2

AgendaCisco’s high end

routerCRS-1

Future directions

CRS-1’s NP Metro (SPP)

CRS-1’s Fabric

CRS-1’s Line Card

Page 3: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 3

What drove the CRS?

OC768

Multi chassis

Improved BW/Watt & BW/Space

New OS (IOS-XR)

Scalable control plane

A sample taxonomy

Page 4: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 4

Multiple router flavours

CoreOC-12 (622Mbps) and up (to OC-768 ~= 40Gbps)

Big, fat, fast, expensive

E.g. Cisco HFR, Juniper T-640

HFR: 1.2Tbps each, interconnect up to 72 giving 92Tbps, start at $450k

Transit/Peering-facingOC-3 and up, good GigE density

ACLs, full-on BGP, uRPF, accounting

Customer-facingFR/ATM/…

Feature set as above, plus fancy queues, etc

Broadband aggregatorHigh scalability: sessions, ports, reconnections

Feature set as above

Customer-premises (CPE)100Mbps

NAT, DHCP, firewall, wireless, VoIP, …

Low cost, low-end, perhaps just software on a PC

A sample taxonomy

Page 5: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 5

Routers are pushed to the edge

Over time routers are pushed to the edge as:BW requirements grow

# of interfaces scale

Different routers have different offeringInterfaces types (core is mostly Eathernet)

Features. Sometimes the same feature is implemented differently

User interface

Redundancy models

Operating system

Costumers look for:investment protection

Stable network topology

Feature parity

Transparent scale

A sample taxonomy

Page 6: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 6

What does Scaling means …

Interfaces (BW, number, variance)

BW

Packet rate

Features (e.g. Support link BW in a flexible manner)

More Routes

Wider ECO system

Effective Management (e.g. capability to support more BGP peers and more events)

Fast Control (e.g. distribute routing information)

Availability

Serviceability

Scaling is both up and down (logical routers)

A sample taxonomy

Page 7: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 7

RouteTableCPU Buffer

Memory

LineInterface

MAC

LineInterface

MAC

LineInterface

MAC

Typically <0.5Gb/s aggregate capacity

Shared Bus

Line Interface

CPU

Memory

Low BW feature rich – centralized Off-chip Buffer

Page 8: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 8

High BW – distributed

LineCard

MAC

LocalBuffer

Memory

CPUCard

LineCard

MAC

LocalBuffer

Memory

“Crossbar”: Switched Backplane

Line Interface

CPUMemory Fwding

Table

RoutingTable

FwdingTable

Typically <50Gb/s aggregate capacity

Page 9: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 9

Distributed architecture challenges (examples)

HW wiseSwitching fabric

High BW switchingQOSTraffic loss

Speedup

Data plane (SW)High BW / packet rateLimited resources (cpu, memory)

Control plane (SW)High event rateRouting information distribution (e.g. forwarding tables)

Page 10: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 10

CRS-1 System View

Shelf controller

Shelf controller

Sys controller

Fabric ShelvesContains Fabric cards,System Controllers

Shelf controller

Shelf controller

Shelf controller

Sys controller

Line Card ShelvesContains Route Processors, Line cards, System controllers

NMS(Full system view)

Out of band GE control bus to all shelf controllers

100m

Page 11: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 11

CRS-1 System ArchitectureFabric

Chassis

FORWARDING PLANE Up to 1152x40G40G throughput per LC

MULTISTAGE SWITCH FABRIC 1296x1296 non-blocking buffered fabric

Roots of Fabric architecture from Jon Turner’s early work

DISTRIBUTED CONTROL PLANEControl SW distributed across multiple control processors

Inte

rfa

ce M

od

ule

Inte

rfa

ce M

od

ule

MID

-PL

AN

EM

ID-P

LA

NE

Line CardLine Card

Line CardLine Card8 of 8

2 of 8

1 of 8

S1

S1 S2S2

S2S2

S3S3

S3S3

S1 S2S2 S3S3

Cisco SPP

Cisco SPP

Modular Service CardModular Service Card

8K Qs

8K Qs

µ µ

Route ProcessorRoute Processor

Route ProcessorRoute Processor

Page 12: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 12

Switch Fabric challenges

Scale - many ports

Fast

Distributed arbitration

Minimum disruption with QOS model

Minimum blocking

Balancing

Redundancy

Page 13: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 13

Previous solution: GSR – Cell based XBAR w centralized scheduling

Each LC has variable width links to and from the XBAR, depending on its bandwidth requirement

Central scheduling ISLIP based

Two request-grant-accept roundsEach arbitration round lasts one cell time

Per destination LC virtual output queues

Supports H/L priorityUnicast/multicast

Linecard(emphasizing fabric interface)

XBAR SwitchingMatrix

(showingconnections for just

one linecard)

Fabric Scheduler(showing

connections for justone linecard)

grant

request

XBAR Control

Request/GrantControl

Virtual OutputQueues

Cellavailabilityinformation

Celltransmitcontrol

ReassemblyQueues

Ingressdata

Egressdata

1 to 16 transmit andreceive lanes

# of lanes varies per linecardtype based on bandwidth

One Output Queue perdestination linecard

One Reassembly Queueper source linecard (and

per unicast/multicast)

To-Fabric Lane(s)

From-Fabric Lane(s)

Page 14: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 14

CRS Cell based Multi-Stage Benes

•Multiple paths to a destination

• For a given input to output port, the no. of paths is equal to the no. of center stage elements

•Distribution between S1 and S2 stages. Routing at S2 and S3

•Cell routing

Page 15: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 15

Fabric speedup

Q-fabric tries to approximate an output buffered switch to minimize sub-port blocking

Buffering at output allows better scheduling

In single stage fabrics a 2X speedup very closely approximates an output buffered fabric *

For multi-stage the speedup factor to approx output buffered behavior is not known

CRS-1 fabric’s ~5X speed up

constrained by available technology* Balaji prabhakar and nick McKeown computer systems technical report CSL-TR-97-738. November

1997.

Page 16: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 16

Fabric Flow ControlOverview

Discard - time constant in the 10’s of mS range

Originates from ‘from fab’ and is directed at ‘to fab’.

Is a very fine level of granularity, discard to the level of individual destination raw queues.

Back Pressure - time constant in the 10’s of S range.

Originates from the Fabric and is directed at ‘to fab’.

Operates per priority at increasingly coarse granularity:

Fabric Destination (one of 4608)

Fabric Group (one of 48 in phase one and 96 in phase two)

Fabric (stop all traffic into the fabric per priority)

Page 17: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 17

Reassembly Window

Cells transitioning the Fabric take different paths between Sprayer and Sponge.

Cells for the same packet will arrive out of order.

The Reassembly Window for a given Source is defined as the the worst-case differential delay two cells from a packet encounter as they traverse the Fabric.

The Fabric limits the Reassembly Window

Page 18: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 18

Linecard challenges

Power

COGS

Multiple interfaces

Intermediate buffering

Speed up

CPU subsystem

Page 19: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 19

Cisco CRS-1 Line Card

MODULAR SERVICES CARD PLIM

MID

PL

AN

EM

IDP

LA

NE

CPUSquidGW

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

Egress Packet FlowFrom Fabric

Interface Module ASIC

RX METRO

RX METRO

IngressQueuingIngressQueuing

TXMETRO

TXMETRO

From Fabric ASIC

From Fabric ASIC

EgressQueuingEgress

Queuing

4

1

8

76

5

23

Page 20: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 20

MODULAR SERVICES CARD PLIM

MID

PL

AN

EM

IDP

LA

NE

CPUSquidGW

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

OC192Framer

and Optics

Egress Packet FlowFrom Fabric

Interface Module ASIC

RX METRO

RX METRO

IngressQueuingIngressQueuing

TXMETRO

TXMETRO

From Fabric ASIC

From Fabric ASIC

EgressQueuin

g

EgressQueuin

g

4

1

8

76

5

23

Line Card CPU

Egress Metro

Ingress Metro

Ingress Queuing

Power Regulators

Fabric Serdes

From Fabric

Egress Queuing

Cisco CRS-1 Line Card

Page 21: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 21

Egress Metro

Ingress Metro

Line Card CPU

Ingress Queuing

Power Regulators

Fabric Serdes

From Fabric

Egress Queuing

Cisco CRS-1 Line Card

Page 22: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 22

Ingress Metro

Cisco CRS-1 Line Card

Page 23: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 23

Metro Subsystem

Page 24: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 24

Metro Subsystem

What is it ?

Massively Parallel NP

Codename Metro

Marketing name SPP (Silicon Packet Processor)

What were the Goals ?

Programmability

Scalability

Who designed & programmed it ?

Cisco internal (Israel/San Jose)

IBM and Tensilica partners

Page 25: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 25

Metro Subsystem

Metro2500 Balls 250Mhz 35W

TCAM125MSPS128kx144-bit entries

2 channels

FCRAM166Mhz DDR9 Channels

Lookups and Table Memory

QDR2 SRAM250Mhz DDR5 Channels

Policing state Classification results Queue length state

Page 26: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 26

Metro Top Level

Packet Out96 Gb/s BW

Packet In96 Gb/s BW

18mmx18mm - IBM .13um

18M gates

8Mbit SRAM and RAs

Control Processor Interface

Proprietary 2Gb/s

Page 27: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 27

Gee-whiz numbers

188 32-bit embedded Risc cores

~50 Bips

175 Gb/s Memory BW

78 MPPS peak performance

Page 28: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 28

Why Programmability ?Simple forwarding – not so simple

Example FEATURES:

• MPLS–3 Labels

• Link Bundling (v4)

• Load Balancing L3 (v4)

• 1 Policier Check

• Marking

• TE/FRR

• Sampled Netflow

• WRED

• ACL

• IPv4 Multicast

• IPv6 Unicast

• Per prefix accounting

• GRE/L2TPv3 Tunneling

• RPF check (loose/strict) v4

• Load Balancing V3 (v6)

• Link Bundling (v6)

• Congestion Control

• IPv4 Unicastlookup algorithm

Hundreds of Load

balancing Entries per

Millions of

Routes

100k+ of

adjacencies

Pointer to

Statistics

Counters

L3loadbalanceentry L2

info

Increasing pressure to add 1-2 level of

increased indirection for High

Availability and increased update

rates

Lookup

L3info

Load Balancing and Adjacencies : Sram/DRAMSram/Dram

leaf

policy basedrouting TCAMtable

TCAM

PBR associative

Sram/DRAM

1:1

data

L2 Adjacency

Programmability also meansAbility to juggle feature orderingSupport for heterogeneous mixes of feature chainsRapid introduction of new features (Feature Velocity)

Page 29: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 29

96G188 96G

96G

96G

PPEPPE

PPEPPE

On-Chip Packet Buffer

Resource Fabric

ResourceResource

ResourceResource

Metro Architecture Basics

Packet tails stored on-

chip Packet Distribution

Run-to-completion (RTC)simple SW model efficient heterogeneous feature processing

RTC and Non-Flow based Packet distribution means scalable architecture

CostsHigh instruction BW supplyNeed RMW and flow ordering solutions

~100Bytes of packet

context sent to PPEs

Page 30: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 30

96G188 96G

96G

96G

PPEPPE

PPEPPE

On-Chip Packet Buffer

Resource Fabric

ResourceResource

ResourceResource

Metro Architecture Basics

Packet Gather

Gather of Packets involves : Assembly of final packets (at 100Gb/s)

Packet ordering after variable length processing

Gathering without new packet distribution

Page 31: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 31

96G188 96G

96G

96G

PPEPPE

PPEPPE

On-Chip Packet Buffer

Resource Fabric

ResourceResource

ResourceResource

Metro Architecture Basics

Packet Buffer

accessible as Resource

Resource Fabric is parallel wide multi-drop busses

Resources consist ofMemoriesRead-modify-write operationsPerformance heavy mechanisms

Page 32: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 32

Metro ResourcesStatistics

512k

TCAM

Interface Tables

Policing100k+

Lookup Engine

2M Prefixes

Table DRAM (10’sMB)

Queue Depth State

P1

P2P3

P4P6

P5

P7

P1

P2P3

P4 P5

P7P8

P9

P8

P6

P9

Root Node

ChildArray

Child Pointer

Child Pointer Child Pointer

ChildArray

ChildArray

<= Level 1

<= Level 2

<= Level 3

CCR April 2004 (vol. 34 no. 2) pp 97-123. CCR April 2004 (vol. 34 no. 2) pp 97-123. “Tree Bitmap : Hardware/Software IP “Tree Bitmap : Hardware/Software IP

Lookups with Incremental Updates”, Will Lookups with Incremental Updates”, Will Eatherton et. Al.Eatherton et. Al.

Lookup Engine uses TreeBitmap Algorithm

FCRAM and on-chip memory

High Update ratesConfigurable performance

Vs density

Page 33: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 33

16 PPE Clusters

Each Cluster of 12 PPE’s

Packet Processing Element (PPE)

.5sqmm per PPE

Page 34: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 34

Packet Processing Element (PPE)

32-bit RISC

ICACHE

DATA Mem

CiscoDMA

instruction bus

Memory mapped Regs

Distribution Hdr

Pkt Hdr

Scratch Pad

Processor Core

ClusterInstruction

MemoryGlobal

Instruction Memory

ClusterData

Mux Unit

To12 PPE’s

Pkt Distribution

From Resources

Pkt Gather

To Resources

Tensilica Xtensa core with Cisco enhancements

32-bit, 5-stage pipeline

Code Density : 16/24 bit instructions

Small instruction cache and data memory

Cisco DMA engine – allows 3 outstanding Descriptor DMAs

10’s Kbytes Fast instruction memory

To12 PPE’s

PPE

Page 35: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 35

Programming Model and Efficiency

Metro Programming ModelRun to completion programming model

Queued descriptor interface to resources

Industry leveraged tool flow

Efficiency Data Points1 ucoder for 6 months: IPv4 with common features (ACL, PBR,

QoS, etc..)

CRS-1 initial shipping datapath code was done by ~3 people

Page 36: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 36

Challenges

Constant power battle

Memory and IO

Die Size Allocation

PPEs Vs HW acceleration

Scalability

On-chip BW vs off-chip capacity

Procket NPU 100MPPS - limited scaling

Performance

Page 37: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 37

future directions

POP convergence

Edge and core differences blur

Smartness in the network

More integrated services into the routing platforms

Feature sets needing acceleration expanding

Must leverage feature code across platforms/markets

Scalability (# of processors, amount of memory, BW)

Page 38: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 38

Summary

Router business is diverse

Network growth push routers to the

edge

Costumers expect scale from one hand

… and smart network

Routers become a massive parallel

processing machines

Page 39: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 39

Questions ?

Thank You

Page 40: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 40

Page 41: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 41

CRS-1 Positioning

Core router (overall BW, interfaces types)

1.2 Tbps, OC-768c Interface

Distributed architecture

Scalability/Performance

Scalable control plane

High Availability

Logical Routers

Multi-Chassis Support

Page 42: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 42

Networks planes

Networks are considered to have three planes / operating timescales

Data: packet forwarding [μs, ns]

Control: flows/connections [ ms, secs]

Management: aggregates, networks [ secs, hours ]

Planes coupling is in descendent order (control-data more, management-control less)

Page 43: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 43

Exact Matches in Ethernet Switches Trees and Tries

Binary Search Tree

< >

< > < >

log

2 NN entries

Binary Search Trie

0 1

0 1 0 1

111010

Lookup time bounded and independent of table size, storage

is O(NW)

Lookup time dependent on table size, but independent of address length, storage is O(N)

Page 44: Cisco crs1

© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 44

Exact Matches in Ethernet Switches Multiway tries

16-ary Search Trie

0000, ptr 1111, ptr

0000, 0 1111, ptr

000011110000

0000, 0 1111, ptr

111111111111

Ptr=0 means no children

Q: Why can’t we just make it a 248-ary trie?