Top Banner
Cilium: Fast IPv6 Container Networking with BPF and XDP LinuxCon 2016, Toronto Thomas Graf (@tgraf__) Kernel, Cilium & Open vSwitch Team Noiro Networks (Cisco)
23

Cilium - Fast IPv6 Container Networking with BPF and XDP

Jan 06, 2017

Download

Software

Thomas Graf
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cilium - Fast IPv6 Container Networking with BPF and XDP

Cilium:

Fast IPv6 Container Networking with

BPF and XDP

LinuxCon 2016, Toronto

Thomas Graf (@tgraf__) Kernel, Cilium & Open vSwitch Team Noiro Networks (Cisco)

Page 2: Cilium - Fast IPv6 Container Networking with BPF and XDP

The Cilium ExperimentScale

– Addressing: IPv6?

– Policy: Linear lists don’t scale. Alternative?

Extensibility

– Can we be as extensible as userspace networkingin the kernel?

Simplicity

– What is an appropriate abstraction away fromtraditional networking?

Performance

– Do we sacrifice performance in the process?

Page 3: Cilium - Fast IPv6 Container Networking with BPF and XDP

Scaling Addressing

Solution:

– IPv6 addresses with host scope allocator

Pros:

– Everything is globally addressable

– No NAT

– Path to ILA for mobility of tasks

Cons:

– Legacy IPv4 only endpoints/applications

→ Optional IPv4 addressing (+ NAT)

→ NAT46: Provide IPv6 only applications to IPv4only clients

Page 4: Cilium - Fast IPv6 Container Networking with BPF and XDP

IPv6 Status in Kubernetes/Docker

● Kubernetes (CNI): Almost there

– Pods are IPv6-only capable as of k8s 1.3.6(PR23317, PR26438, PR26439, PR26441)

– Kubeproxy (services) not done yet

● Docker (libnetwork): Working on it

– PR826 - “Make IPv6 Great Again”Not merged yet

Page 5: Cilium - Fast IPv6 Container Networking with BPF and XDP

Scaling PolicyLB Frontend Backend

Page 6: Cilium - Fast IPv6 Container Networking with BPF and XDP

Scaling Policy

LB BEFE

LB FE

FE BE

LB

LB Frontend Backend

Policy:

NetworkPolicy Kubernetes policy specas discussed and standardized in theNetworking SIG

https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/network-policy.md

Page 7: Cilium - Fast IPv6 Container Networking with BPF and XDP

Scaling Policy

LB QA BE QAFE QA

LB Prod BE ProdFE Prod

LB FE

FE BE

LB

LB Frontend Backend

QA

Prod

Policy:

Page 8: Cilium - Fast IPv6 Container Networking with BPF and XDP

Scaling Policy

LB QA BE QAFE QA

LB Prod BE ProdFE Prod

LB FE

FE

QA

ProdBE

LB QA

Prod

requires

requires

LB Frontend Backend

QA

Prod

Policy:

Cilium extension

Not yet part ofKubernetes spec

QA

Page 9: Cilium - Fast IPv6 Container Networking with BPF and XDP

Scaling Policy Enforcement

LB FE

FE

QA

Prod

BE

LB QA

Prod

requires

requires

LB QA

FE QA

LB Prod10

11

12

13

Policy enforcement cost becomes a single hashtablelookup regardless of number of containers or policy

complexity.

BE QA

FE Prod 14

BE Prod 15

Distributed Label ID Table:Policy:

QA

This ID is carried in packet asmetadata to provide securitycontext at destination host

Page 10: Cilium - Fast IPv6 Container Networking with BPF and XDP

Extensibility

Page 11: Cilium - Fast IPv6 Container Networking with BPF and XDP

Kernel

Userspace

SourceCode

ByteCode

LLVM/clang

Sockets

netdevice

NetworkStackTC

Ingress

TCEgress

netdevice

Verifier+ JIT

add eax,edxshl eax,2

add eax,edxshl eax,2

BPF – Berkley Packet Filter

Page 12: Cilium - Fast IPv6 Container Networking with BPF and XDP

Kernel

Userspace

BPFProgram

UserspaceProcess

BPF Maps & Perf Ring Buffer

BPF MapHashtable

BPF MapArray

UserspaceProcess

BPFProgram

Per RingBuffer

Data DataTail Call

Page 13: Cilium - Fast IPv6 Container Networking with BPF and XDP

BPF Features(As of Aug 2016)

● Efficient data sharing via maps

– Per-CPU/global arrays & hashtables

● Rewrite packet content

● Extend/trim packet size

● Redirect to other net_device

● Attachment of tunnel metadata

● Cgroups integration

● Access to high performance perf ring buffer

● …

Page 14: Cilium - Fast IPv6 Container Networking with BPF and XDP

Kernel

Userspace

XDP – Express Data PathSourceCode

ByteCode

LLVM/clang

Sockets

NetdeviceNetwork

Stack

Verifier+ JIT

add eax,edxshl eax,2

Driver

Access toDMA buffer

Page 15: Cilium - Fast IPv6 Container Networking with BPF and XDP

Kernel

Cilium Layer

Orchestrationsystems

eth0

BPFProgram

CiliumDaemon

CiliumMonitor

CiliumCLI

BPF Program

Conntrack Policy

Bytecode injection

Events

BPF Program

Conntrack Policy

CodeGeneration

PluginsPolicy

Repository

Cilium Architecture

Page 16: Cilium - Fast IPv6 Container Networking with BPF and XDP

Why is this awesome?

On the fly BPF program generation means:

● Extensibility of userspace networking in the kernel

● MAC, IP, port number, … all become constants→ compiler can optimize heavily!

● BPF programs can be recompiled and replaced withoutinterrupting the container and its connections

– Features can be compiled in/out at runtime withcontainer granularity

● Access to fast BPF maps and perf ring buffer to interactwith userspace.

– Drop monitor in n*Mpps context

– Use notifications for policy learning, IDS, logging, ...

Page 17: Cilium - Fast IPv6 Container Networking with BPF and XDP

Available Building Blocks

● L3 forwarding (IPv6 & IPV4)

● Host connectivity

● Encapsulation(VXLAN/Geneve/GRE)

● ICMPv6 generation

● NDisc & ARP responder

● Access Control

Currently working on:

● Fragmentation handling

● Mobility

● Port Mapping (TCP/UDP)

● Connection tracking

● L3/L4 Load Balancer

● Statistics

● Events (perf ring buffer)

● Debugging framework

● NAT46

● End to end encryption

Page 18: Cilium - Fast IPv6 Container Networking with BPF and XDP

Networking should be invisible,it is not.

Simplicity

Page 19: Cilium - Fast IPv6 Container Networking with BPF and XDP

Simplicity

● L3 only (Calico gets this right)

– No L2 scaling issues, no broadcast domains, no L2vulnerabilities

● No “Networks”

– No need for containers to join multiple networksto access multiple isolation domains. No need formultiple addresses.

● Policy definition independent of addressing

– As specified in Kubernetes Networking SIG

– All policies based on container labels

Page 20: Cilium - Fast IPv6 Container Networking with BPF and XDP

Performance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220

100

200

300

400

500

600

Container to container on local node

# Cores

Gb

it

netperf -t TCP_SENDFILE -H beef::aa0:18:ee5e1 TCP flow per core, 10’000 policies

Intel Xeon 3.5Ghz Sandy Bridge, 24 cores

Page 21: Cilium - Fast IPv6 Container Networking with BPF and XDP

Performance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Container to container over 10GiB NICs

64128256512102464000

# Cores

MB

it

netperf -t TCP_SENDFILE -H beef::aa0:18:ee5e1 TCP flow per core, 10’000 policies

Intel Xeon 3.5Ghz Sandy Bridge, 24 cores

Page 22: Cilium - Fast IPv6 Container Networking with BPF and XDP

<Insert Cool Demo Here>

Page 23: Cilium - Fast IPv6 Container Networking with BPF and XDP

Q&A

Image Sources:

● Cover (Toronto)Rick Harris (https://www.flickr.com/photos/rickharris/)

● The Invisible ManDr. Azzacov (https://www.flickr.com/photos/drazzacov/)

Start hacking with BPF for containers:http://github.com/cilium/cilium

Contact:

Slack: cilium.slack.com

Twitter: @tgraf__ Mail: [email protected]

Team:● André Martins● Daniel Borkmann

● Madhu Challa● Thomas Graf