Docker Belgium Meetup

Docker Meetup: User namespaces &Multi-architecture Support

Phil EstesSenior Technical Staff Member, Open Technologies, IBM Cloud

@estesp

[email protected]

I work for IBM’s Cloud divisionWe have a large organization focused on open cloud technologies, including CloudFoundry, OpenStack, and Docker.

I have been working upstream in the Docker community since July 2014, and am currently a Docker core maintainer.

I have interests in runC (IBM is a founding member of OCI), libnetwork, and the docker/distribution project (Registry v2)

Trivia: I worked in IBM’s Linux Technology Center for over 10 years!

Hello!

2

Why user namespaces?

Unprivileged Root

Currently, by default, the user inside the container is root; more specifically uid = 0, gid = 0. If a breakout were to occur, the container user is root on the host system.

Multitenancy

Sharing Docker compute resources among more than one user requires isolation between tenants. Providing uid/gid ranges per tenant will allow for this separation.

User Accounting

Any per-user accounting capabilities are useless if everyone is root. Specifying unique uids enables resource limitations specific to a user/uid.

3

Docker Security

User namespaces are only one piece of the puzzle.

AppArmor/SELinux, Notary, image security, and proper environment/network security all play a

part in the overall Docker security picture.

4

Linux user namespaces

◉ Available as a clone() flag [CLONE_NEWUSER] in Linux kernel 3.8 (some work completed in 3.9)

◉ Per-process namespace to map user and group IDs to a specified set of numeric ranges

uid = 1000gid = 1000 pid = 8899

uid = 0gid = 0

clone(.., .. | CLONE_NEWUSER)parent process

5

“

“Most notably, a process can have a nonzero user ID outside a namespace while at the same time having a user ID of zero inside

the namespace; in other words, the process is unprivileged for operations outside the user namespace but has root privileges

inside the namespace.”https://lwn.net/Articles/532593/Michael Kerrisk, February 27, 2013

6

https://lwn.net/Articles/532593/

https://lwn.net/Articles/532593/

User namespaces and Go

◉ Available since Go version 1.4.0 (October 2014) as fields within the syscall.SysProcAttr structure: arrays UidMappings and GidMappings

◉ Thanks to good work from Mrunal Patel and Michael Crosby laying the Go-lang groundwork for user namespace capability within Docker/libcontainer( https://github.com/golang/go/issues/8447 )

7

https://github.com/golang/go/issues/8447

Go user namespaces example

var sys *syscall.SysProcAttr sys.UidMappings = []syscall.SysProcIDMap{{ ContainerID: 0, HostID: 1000, Size: 1,}}sys.GidMappings = []syscall.SysProcIDMap{{ ContainerID: 0, HostID: 1000, Size: 1,}}

sys.Cloneflags = syscall.CLONE_NEWUSER

cmd := exec.Cmd{ Path: "/bin/bash", SysProcAttr: sys,}

When we run this code we’ll have a command that, when executed,

will appear to be running as root

(uid/gid = 0), but will actually be the non-privileged user with

uid/gid = 1000 mapped inside the user

namespace to root.

8

Layer SharingDocker images are downloaded to the local daemon’s cache from a registry and expanded into the storage driver’s subtree by ID

9

Docker Image Layer Details

FROM ubuntu:15.04…RUN apt-get install libdevmapper \ libdevmapper-dev…COPY issue crontab /etc/

b3ef6

98a5f

55b9d

-rw-r--r-- root:root /etc/issue-rw-r--r-- root:root /etc/crontab

-rwxr-xr-x root:root /bin/sh-rwxr-xr-x root:root /bin/ip

-r-xr-xr-x root:root /lib/libdevmapper.so.1.02-r--r--r-- root:root /usr/lib/libdevmapper.a

useless:1.0b3ef6

98a5f

55b9d

79cc4

b3ef6

98a5f

55b9d

cc4af

docker run useless:1.0 /bin/sh

docker run useless:1.0 /bin/sh

10

Layer Sharing Solution

Given:◉ Already mentioned: one metadata subtree per

remapped root◉ Remapped root setting is daemon-wide (for all

containers running in this instance)

Therefore we:◉ Untar all layers per the user namespace uid/gid

mapping provided at daemon start◉ All layers are usable (correct ownership) by any

container in this daemon instance

restriction?

11

Pros

No ugly chown -R uid:gid <huge file tree> work to do at container start time.

For a daemon-wide user namespace setting, this solution works perfectly for the general “don’t be root” case.

Layer Solution Pros/Cons

Cons

Restarting the daemon with/without remapped roots resets the metadata cache (must re-pull images, no prior container history)

Some increased disk cost if daemon is started with unique remappings or turned on/off 12

> User namespace support in Linux kernel 3.8

> User namespace support in Go 1.4

> User namespace support in libcontainer

13

(early 2013)

(December 2014)

(February 2015)

http://integratedcode.us/2015/10/13/user-namespaces-have-arrived-in-docker/

> User namespace support available now in “experimental” build of Docker engine




Demo Time!

A brief look at a Docker engine instance with user namespaces enabled

14

$ docker run -v /bin:/host/bin -ti busybox /bin/sh

/ # iduid=0(root) gid=0(root) groups=10(wheel)/ # cd /host/bin/host/bin # mv sh oldmv: can't rename 'sh': Permission denied/host/bin # cp /bin/busybox ./shcp: can't create './sh': File exists

Multi-Architecture Support

15

$ docker run redis$ docker run nginx$ docker run mysql$ docker run wordpress$ docker run node$ docker run ...

What should happen?

But what if I have ...

16


17

$ docker run redisFATA[0003] Error response from daemon: Cannot start container 0f0fa3f8...: exec format error

What does happen?

Hacking Multi-Arch

18

◉ Creating special Hub repositories for an architecture’s

images (e.g. “ppc64le” for little-endian POWER 64-bit)

◉ Standardized prefix for all images for a given

architecture (e.g. “hypriot” rpi-<name> images for

Raspberry Pi)

◉ Tags for architecture (e.g. ubuntu:amd64, ubuntu:arm,

ubuntu:ppc64le)

The Right Solution

19

https://github.com/docker/distribution/pull/1068

> A new Docker registry manifest format!

◉ Properly handles and records runtime architecture

information (assembled on `docker push`)

◉ Enables a new “fat manifest” type which can contain

references to multiple single-architecture manifests,

keyed on architecture details (os, arch, model, etc.)

https://github.com/docker/distribution/pull/1068


20

$ docker run redis

What should happen!

<engine> What architecture am I? (os/arch/model details)<engine> query registry for manifest<registry> respond with fat manifest, if exists<engine> parse fat manifest, look for section matching my os/arch/model details)<engine> found section; now parse layer list; request items from registry<registry> respond with layer blobs for requested digest/hashes<engine> start container using layer data matching local engine os/arch/model

Any questions?Thanks!

21

@estesp

github.com/estesp

[email protected]

http://integratedcode.us

mailto:[email protected]

mailto:[email protected]



Credits

Special thanks to all the people who made and released these awesome resources for free:

◉ Presentation template by SlidesCarnival

22

http://www.slidescarnival.com/

Docker Belgium Meetup

Software