Top Banner
Design Requirements for Bullet-Proof Packet Passers Avi Freedman [email protected] Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai
23

Design Requirements for Bullet-Proof Packet Passers Avi Freedman [email protected] Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Mar 27, 2015

Download

Documents

Rebecca Malone
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Design Requirementsfor

Bullet-Proof Packet Passers

Avi Freedman

[email protected]

Chief Technical Officer, Netaxs

VP and Chief Network Architect, Akamai

Page 2: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Overview

• Goals and problems in Good Networking

• Current and future SLAs

• Failure analysis

• Hardware requirements

• Software requirements

• Sample architecture – Nortel OPC

• Open questions

Page 3: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Goals for Good Networking

• The three things that customers seem to want from IP networking:– Stability– Performance– Burstability/capacity assurance– Price

• Order varies, but Stability is almost always #1

Page 4: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Problems in Good Networking

• Performance is often a backbone capacity – and more often a peering/transit issues.

• Burstability problems come from lack of large aggregation capabilities (no 100 gb ports to connect 1gb customers to); a soluble engineering effort, though, with enough of even today’s hardware.

Page 5: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Problems in Good Networking

• The biggest problem is stability. Four main causes:– Operator error– Software– Fiber cuts– Hardware

• One can argue over ranking, but all are important.• Fiber is a soluble issue with money and

engineering.• We’ll revisit these.

Page 6: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Current and Future SLAs• Today’s SLAs are fairly weak. SLAs of the future

will trend towards minutes per year of outage, with large credits for complete outages.

• CDNs already offer SLAs that give 1 day credit for a 15 minute slowdown (not even outage).

• Today’s hardware and software cannot be relied upon to pass IP packets reliably enough to meet these SLAs.

• To meet these SLAs, 5 minutes/year of system-wide outage is probably all that customers will tolerate at some point – and the first network to offer it in a vacuum will win huge market share.

Page 7: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Failure Analysis – Op Error

• What causes operator error?• Often it’s not ignorance, but the fact that doing

distributed configuration is hard with today’s tools.

• Key point – cisco ‘no’ method has caused many a network outage.

• GUIs are unwiedly, though.• And Unix OS on routers is a security problem!• Industry work on ‘safer’ GUIs is needed.

Page 8: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Failure Analysis - Hardware

• Hardware is typically less of a problem, but OIR often stands for “Online insert and reboot”.

• The design needs to be simple, elegant, and redundant.

• Ideally, scalable and expandable as well, but simplicity of design is the best assurance of stability.

Page 9: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Failure Analysis - Software

• Router software causes literally hundreds of outages per year – even (excuse the term) megalapses inside networks.

• Most of the problems do NOT relate to protocol design, though there are scaling issues to be solved there.

• Most of the problems come from –– Bad code– Bad OS (OS fails to protect against bad code)

Page 10: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Failure Analysis – CPU Protection

• Additionally, there is a chronic problem in that vendors are not providing sufficient protection for the route-processing engines, and as denial of service attacks get more aggressive, this is a growing problem!

• The industry needs to describe to vendors what rules are needed– (Don’t allow multicast except for OSPF to

connected interfaces, etc…)

Page 11: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Failure Analysis – Software Modularity

• In addition to contributing to bad code, the more monolithic nature of current router OSs make it hard to avoid downtime while upgrading the network.

• Upgrade-on-the-fly (with a base OS that remains unchanged) is an elusive goal, but it is achievable – 5ESS and DMS boxes prove it.

Page 12: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Sample Architecture – Nortel OPC

• As a case study, we consider the Nortel OPTera Packet Core, which has been designed around carrier-class robustness, with feedback from industry and telephony-switch engineers.

• The OPC is a 3+-year-old research project that went into “product” mode about a year ago. Products are about a year out, so Nortel is aggressively seeking input about robustness!

Page 13: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

OPC – Design Requirements

• The OPC team defined 99.999% as the target uptime, and defined “uptime” as uptime across ports. So, 5 minutes downtime across all (of up to) 480 ports, or potentially more downtime across fewer ports.

• Figures 2 software upgrades/year, and splits “acceptable” failures roughly evenly between hardware and software.

Page 14: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

OPC – Hardware Overview

• The OPC starts with a base 20 slot “application shelf” chassis of port and/or processor cards, and fabric slots. Base config can run in-chassis fabric, but is not expandable on the fly.

• If broken out into an application shelf and fabric shelf, can be expanded to full 480-slot config without downtime or packet loss.

Page 15: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

OPC – Hardware Overview• Each slot has (up to) 10gb of “port” capacity, and

16gb of backplane (14.5gb effective after overhead).

• Maximally configured, a 4.8tb router consisting of 24 application shelves in 12 racks, 16 fabric shelves in 4 bays, and a processor shelf.

• Each shelf can be up to 1km apart (entire system must be within 1km diameter per spec, though it’s not clear this is a robustness-enhancing function until the router can operate partitioned)

Page 16: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

OPC – Fabric

• The OPC fabric is “passive” – with each possible set of boards, the config is fixed, and no software is required to drive or configure the fabric.

• Can be imagined as parallel train tracks, with each board being a “station”, and slightly fewer “trains” shuttling 4 cells of traffic (each cell being one of 4 fixed priorities per cell). More boards is more stations.

Page 17: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

OPC – Card Architecture

• Each card has a general-purpose CPU (Motorola 750), and two packet-processor chips (the RSP2).

• The RSP2 runs “software”, mostly microcode, scheduling, etc…

• The RSP2 can do up to 100 instructions on each of 16 packets in parallel, and then in serial for packet modification.

• For read-only packet processing, within 1% of line rate is possible per card. 40-43 byte packets are line-rate, 65-70 byte packets yeild < 1% loss, beyond is line-rate.

Page 18: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

OPC - Software

• The major cause of software-based router failures is bad code. Ultimately, better software engineering is required.

• Along the way, sound software architecture and protective features are needed.

• And on-the-fly upgrade-ability.

• As well as main-CPU-protection.

Page 19: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

OPC – Main-CPU Protection

• Each board’s RSP2s can do packet classification inbound or outbound, can throw away packets, replicate them (multicast or sniffing), kick them up to the main CPU, or send them to another port/card.

• The capability exists as well to shape different classes of traffic as part of kicking packets up to the main CPU on-card or on another card.

• The key is the ruleset; input is needed.

Page 20: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Main CPU Protection

• As a general issue, rules should be reflected in multiple router vendors.

• Rules such as –– 64k/sec of BGP from an IP, only if we are

talking to that IP– No non-OSPF multicast– 10 packets per second to each connected IP

Page 21: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Nortel OPC - CLI

• Nortel is soliciting input on robust CLI design to reduce operator error.

• Possibilities include ability for comments, transactions (commit/rollback), network-wise synchronized update (though this can cause instability as well)

Page 22: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

OPC – Software Architecture• We now talk about the software that runs on the

main CPUs, and the main Motorola 750 procs per board.

• Chorus multi-threaded, multi-CPU real-time OS as a base. Has memory protection and preemptive multitasking.

• IPC layer (“RACE”) on top, handles communication between processes “agents” and threads. Among other things, RACE allows “virtual synchrony” – running multiple processes in parallel and taking the first answer as a result.

• This allows for easy upgrading of processes, and robustness in case of single- or multi-card failures.

Page 23: Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai.

Open Questions

• What are other vendors doing? Cisco, Juniper, Avici all seem to be missing in major areas Nortel is addressing. Of course, you can buy Cisco, Juniper, and Avici products now

• CLI design input• CPU protection rule input• Software architecture input (what modules should

be on-the-fly upgrade-able); for example, trade-offs in BGP converge-ance vs. upgrade-ability.