This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• A Virtual Switch (vSwitch) is a software component within a server that allows one inter-virtual machine (VM) communication as well as communication with external world
• A vSwitch has a few key advantages: • Provides network functionalities right inside
the hypervisor layer • Operations are similar to that of the
hypervisor yet with control over network functionality
• Compared to a physical switch, it's easy to roll out new functionality, which can be hardware or firmware related Host
• Is there a way to provide a software networking data plane:
• Able to load and chain Virtual Network Functions dynamically
• Extensible
• Fully programmable, able to freely access the raw network devices
• In-kernel – leverage all the existing kernel features for • Hardware support and portability • Guaranteed runtime safety • Predictable performance (delay, jitter, throughput…)
• LLVM backend: any platform that LLVM compiles into will work. (GCC backend in the works) à PORTABILITY! Use Cases: 1. networking 2. tracing (analytics, monitoring, debugging) 3. in-kernel optimizations 4. hw modeling 5. crazy stuff... *http://lwn.net/Articles/599755/
• BPF programs can attach to sockets, the traffic control (TC) subsystem, kprobe, syscalls, tracepoints…
• Sockets can be STREAM (L4/UDP), DATAGRAM (L4/TCP) or RAW (TC)
• This allows to hook at different levels of the Linux networking stack, providing the ability to act on traffic that has or hasn’t been processed already by other pieces of the stack
• Opens up the possibility to implement network functions at different layers of the stack
• BPF programs can attach to sockets, the traffic control (TC) subsystem, kprobe, syscalls, tracepoints…
• Sockets can be STREAM (L4/UDP), DATAGRAM (L4/TCP) or RAW (TC)
• This allows to hook at different levels of the linux networking stack, providing the ability to act on traffic that has or hasn’t been processed already by other pieces of the stack
• Opens up the possibility to implement network functions at different layers of the OSI stack
Hooking BPF into the Linux networking stack (TX)
HW/veth/cont
USERSPACE
TAP/Raw (RO)
driver
dev_queue_xmit()
TC / traffic control
IP / routing
KER
NEL
SPA
CE
insns 1
BPF
Socket (TCP/UDP)
For simplicity, the following slides simplify this view into a single “kernel networking stack”
Is there an easier/safer way to use this technology? Higher-level APIs for producing and using BPF code
• BPF ensures that programs to be loaded in the kernel won’t crash or loop forever, by running it through a “verifier” upon loading it. (BPF_PROG_LOAD)
• But it is today possible to write programs in C that would compile into invalid BPF (C is like that), and a user would only know upon trying to run it
• A BPF-specific frontend would allow for a compiler to provide feedback on the validity of the code
• Current approaches to converting a C program to BPF involve many custom steps, tools • clang frontend, llvm backend with BPF support • kernel samples/bpf/libbpf.c APIs • ELF loader with sec<on rewrites • programs use low-level helper functions
Demo 2: Using BPF for a versatile networking application
• Let’s assume that we have a set of applications running on top of a multitenant overlay network Think an Openstack cloud running on top of VxLAN, or an IP VPN running on top of MPLS
• Let’s store statistics of all the endpoints for every “overlay”, and also the endpoints for every “underlay”, in realtime, without latency.
Think seeing in realtime the traffic between all VMs of an Openstack cloud (without having to have
administrative access), or being able to see the traffic between every CE router, IP phone, server or endpoint connected to the IP VPN • Write a program that measures the traffic traversing the physical network and
dynamically stores measurements of all all metadata independently of whether it’s outer (VxLAN, MPLS) or inner (Ethernet/IP). Then display on demand each level of depth