Userspace networking
Post on 08-May-2015
14209 Views
Preview:
DESCRIPTION
Transcript
Networking in UserspaceLiving on the edge
Stephen Hemmingerstephen@networkplumber.org
Problem Statement
64 1072208 352 496 640 784 928 1216136015040
5,000,000
10,000,000
15,000,000
20,000,000
Packet Size (bytes)
Pac
kets
per
sec
ond
(bid
irec
tion
al)
Intel: DPDK Overview
Server vs Infrastructure
Server Packets Network InfrastructurePacket Size 64 bytes
Packets/second 14.88 Million
Arrival rate 67.2 ns
2 GHz Clock cycles
135 cycles
3 Ghz Clock cycles
201 cycles
L3 hit on Intel® Xeon® ~40 cycles L3 miss, memory read is (201 cycles at 3 GHz)
Packet Size 1024 bytes
Packets/second 1.2 Million
Arrival rate 835 ns
2 GHz 1670 cycles
3 Ghz 2505 cycles
Traditional Linux networking
TCP Offload Engine
Good old sockets
Flexible, portable but slow
Memory mapped buffers
Efficient, but still constrained by architecture
Run in kernel
Slide 7
The OpenOnload architecture
� Network hardware provides a user-safe interface which
can route Ethernet packets to an application context
based on flow information contained within headers
Network Adaptor
Kernel
Context
Network Driver
Protocol
Application
Context
Application Application
DMADMA
Application
Context
Application
Protocol
Driver
No new protocols
Slide 8
The OpenOnload architecture
� Protocol processing can take place both in the
application and kernel context for a given flow
Network Adaptor
Kernel
Context
Network Driver
Protocol
Application
Context
Application Application
DMADMA
Application
Context
Application
Protocol
Driver
Enables persistent / asynchronousprocessing
Maintains existing network control-plane
Slide 9
The OpenOnload architecture
� Protocol state is shared between the kernel and
application contexts through a protected shared
memory communications channel
Network Adaptor
Kernel
Context
Network Driver
Protocol
Application
Context
Application Application
DMADMA
Application
Context
Application
Protocol
Driver
Enables correct handling of protocol state with high-performance
Slide 11
Performance metrics
� Overhead
– Networking overheads take CPU time away from your application
� Latency
– Holds your application up when it has nothing else to do
– H/W + flight time + overhead
� Bandwidth
– Dominates latency when messages are large
– Limited by: algorithms, buffering and overhead
� Scalability
– Determines how overhead grows as you add cores, memory, threads, sockets etc.
Slide 12
Anatomy of kernel-based networking
Slide 13
A user-level architecture?
Slide 14
Direct & safe hardware access
Slide 88
Some performance results
� Test platform: typical commodity server
– Intel clovertown 2.3 GHz quad-core xeon (x1)
1.3 GHz FSB, 2 Gb RAM
– Intel 5000X chipset
– Solarflare Solarstorm SFC4000 (B) controller, CX4
– Back-to-back
– RedHat Enterprise 5 (2.6.18-8.el5)
Slide 89
Performance: Latency and overhead
1.15.3Onload
7.011.2Kernel
--4.2Hardware
CPU overhead
(microseconds)
½ round-trip latency
(microseconds)
� TCP ping-pong with 4 byte payload
� 70 byte frame: 14+20+20+12+4
Slide 92
Performance: Streaming bandwidth
Slide 93
Performance: UDP transmit
� Nessage rate:
– 4 byte UDP payload (46 byte
frame)
2,030,000473,0001 sender
OnloadKernel
Slide 94
Performance: UDP transmit
� Nessage rate:
– 4 byte UDP payload (46 byte
frame)
2,030,000473,0001 sender
3,880,000532,0002 senders
OnloadKernel
Slide 95
Performance: UDP receive
Slide 100
OpenOnload Open Source
� OpenOnload available as Open Source (GPLv2)
– Please contact us if you’re interested
� Compatible with x86 (ia32, amd64/emt64)
� Currently supports SMC10GPCIe-XFP and SMC10GPCIe-10BT
NICs
– Could support other user-accessible network interfaces
� Very interested in user feedback
– On the technology and project directions
Netmap
http://info.iet.unipi.it/~luigi/netmap/
● BSD (and Linux port)
● Good scalability
● Libpcap emulation
Netmap
Netmap API
● Access– open("/dev/netmap")
– ioctl(fd, NIOCREG, arg)
– mmap(..., fd, 0) maps buffers and rings
● Transmit
– fill up to avail buffers, starting from slot cur.
– ioctl(fd,NIOCTXSYNC) queues the packets
● Receive
– ioctl(fd,NIOCRXSYNC) reports newly received packets
– process up to avail buffers, starting from slot cur.
These ioctl()s are non-blocking.
Netmap API: synchronization
● poll() and select(), what else!
– POLLIN and POLLOUT decide which sets of rings to work on
– work as expected, returning when avail>0
– interrupt mitigation delays are propagated up to the userspace process
Netmap: multiqueue
● Of course.– one netmap ring per physical ring
– by default, the fd is bound to all rings
– ioctl(fd, NIOCREG, arg) can restrict the binding to a single ring pair
– multiple fd's can be bound to different rings on the same card
– the fd's can be managed by different threads
– threads mapped to cores with pthread_setaffinity()
Netmap and the host stack
● While in netmap mode, the control path remains unchanged:
– ifconfig, ioctl's, etc still work as usual
– the OS still believes the interface is there
● The data path is detached from the host stack:– packets from NIC end up in RX netmap rings
– packets from TX netmap rings are sent to the NIC
● The host stack is attached to an extra netmap rings:– packets from the host go to a SW RX netmap ring
– packets from a SW TX netmap ring are sent to the host
– these rings are managed using the netmap API
Netmap: Tx performance
Netmap: Rx Performance
Netmap SummaryPacket Forwarding Mpps
Freebsd bridging 0.690
Netmap + libpcap 7.500
Netmap 14.88
Open vSwitch Mpps
userspace 0.065
linux 0.600
FreeBSD 0.790
FreeBSD+netmap/pcap 3.050
Intel DPDK Architecture
TRANSFORMING COMMUNICATIONSIntel Restricted Secret2020 TRANSFORMING COMMUNICATIONS
The Intel® DPDK Philosophy
• Must run on any IA CPU‒ From Intel® Atom™ processor to the
latest Intel® Xeon® processor family‒ Essential to the IA value proposition‒
• Focus on the fast-path ‒ Sending large number of packets to the
Linux Kernel /GPOS will bog the system down
Provide software examples that address common network performance deficits
‒ Best practices for software architecture‒ Tips for data structure design and storage‒ Help the compiler generate optimum code‒ Address the challenges of achieving 80
Mpps per CPU Socket
Control Plane Data Plane
Intel® DPDK Fundamentals• Implements a run to completion model or
pipeline model• No scheduler - all devices accessed by
polling• Supports 32-bit and 64-bit with/without
NUMA • Scales from Intel® Atom™ to Intel®
Xeon® processors• Number of Cores and Processors not
limited• Optimal packet allocation across DRAM
channels
TRANSFORMING COMMUNICATIONSIntel Restricted Secret2121 TRANSFORMING COMMUNICATIONS
Platform Hardware
Intel® DPDK Libraries
Intel® Data Plane Development Kit (Intel® DPDK)Intel® DPDK embeds optimizations for the IA platform:- Data Plane Libraries and Optimized NIC Drivers in Linux User Space
- Run-time Environment
- Environment Abstraction Layer and Boot Code
- BSD-licensed & source downloadable from Intel and leading ecopartners
Linux Kernel
Packet Flow Classification
NIC Poll Mode Library
Queue/Ring Functions
Buffer ManagementCustomer Application
Customer Application
Customer Application
Environment Abstraction Layer
Environment Abstraction Layer
Kernel Space
User Space
TRANSFORMING COMMUNICATIONSIntel Restricted Secret2222 TRANSFORMING COMMUNICATIONS
Intel® DPDK Libraries and Drivers
• Memory Manager: Responsible for allocating pools of objects in memory. A pool is created in huge page memory space and uses a ring to store free objects. It also provides an alignment helper to ensure that objects are padded to spread them equally on all DRAM channels.
• Buffer Manager: Reduces by a significant amount the time the operating system spends allocating and de-allocating buffers. The Intel® DPDK pre-allocates fixed size buffers which are stored in memory pools.
• Queue Manager:: Implements safe lockless queues, instead of using spinlocks, that allow different software components to process packets, while avoiding unnecessary wait times.
• Flow Classification: Provides an efficient mechanism which incorporates Intel® Streaming SIMD Extensions (Intel® SSE) to produce a hash based on tuple information so that packets may be placed into flows quickly for processing, thus greatly improving throughput.
• Poll Mode Drivers: The Intel® DPDK includes Poll Mode Drivers for 1 GbE and 10 GbE Ethernet* controllers which are designed to work without asynchronous, interrupt-based signaling mechanisms, which greatly speeds up the packet pipeline.
TRANSFORMING COMMUNICATIONSIntel Restricted Secret2323 TRANSFORMING COMMUNICATIONS
Intel® DPDK Native and Virtualized Forwarding Performance
Comparison
Netmap DPDK OpenOnload
License BSD BSD GPL
API Packet + pcap Packet + lib Sockets
Kernel Yes Yes Yes
HW support Intel, realtek Intel Solarflare
OS FreeBSD, Linux Linux Linux
Issues
● Out of tree kernel code
– Non standard drivers
● Resource sharing
– CPU
– NIC
● Security
– No firewall
– DMA isolation
What's needed?
● Netmap
– Linux version (not port)
– Higher level protocols?
● DPDK
– Wider device support
– Ask Intel
● Openonload
– Ask Solarflare
● OpenOnload
– A user-level network stack (Google tech talk)● Steve Pope ● David Riddoch
● Netmap - Luigi Rizzo
– http://info.iet.unipi.it/~luigi/netmap/talk-atc12.html
● DPDK– Intel DPDK Overview
– Disruptive network IP networking● Naoto MASMOTO
Thank you
top related