Top Banner
docker 原理與實作 果凍
39

Docker 原理與實作

Sep 08, 2014

Download

Technology

ya790026

the technology behind docker.
This is for osdc.tw 2014
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Docker 原理與實作

docker 原理與實作果凍

Page 2: Docker 原理與實作

簡介

● 任職於迎廣科技○ python○ openstack

● http://about.me/ya790206● http://blog.blackwhite.tw/● https://github.com/ya790206/call_seq

Page 3: Docker 原理與實作

Agenda

● linux kernel namespace● seccomp● cgroup● lxc● docker

Page 4: Docker 原理與實作

docker

● lightweight, portable, self-sufficient containers.

● the process running in the container is isolated from the process running in the other container.

Page 5: Docker 原理與實作

Linux startup process

● Linux startup process○ Boot loader -> ○ Kernel -> ○ Init process

● Difference between Linux distros:○ package manager○ init

Page 6: Docker 原理與實作

Docker

Autofs lxc

Kernel namespaces

Apparmor and SELinux profiles

Seccomp policies

Control groups

Kernel capabilities Chroots

btrfs

Page 7: Docker 原理與實作

kernel namespace

● The purpose of each namespace is to wrap a particular global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.

● Private view

Page 8: Docker 原理與實作

kernel pid namespaceroot pid namespace

pid 1 (pid 1)

pid namespace x pid 2 (pid 2)

pid 3 (pid 1)

pid 4 (pid 2) ● black: the real pid.● red: the pid process use getpid

to get.

Page 9: Docker 原理與實作

kernel namespace

Mount namespacesUTS namespacesPID namespaces Network namespacesUser namespaces IPC namespaces

Page 10: Docker 原理與實作

int child_pid = clone(child_main, child_stack+STACK_SIZE, CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | SIGCHLD, NULL);

● https://gist.github.com/ya790206/9855021

Page 11: Docker 原理與實作

尾巴沒藏好

Page 12: Docker 原理與實作

int child_pid = clone(child_main, child_stack+STACK_SIZE, CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, NULL);mount("proc", "/proc", "proc", 0, NULL);

● https://gist.github.com/ya790206/9855094

Page 13: Docker 原理與實作

seccomp

● A process running in seccomp mode is severely limited in what it can do;

● there are only four system calls - read(), write(), exit(), and sigreturn() to already-open file descriptors.

Page 14: Docker 原理與實作

libseccomp example

https://gist.github.com/ya790206/9579145

Page 15: Docker 原理與實作

cgroup

● This work was started by engineers at Google

● Resource limiting● Prioritization● Accounting● Control

Page 16: Docker 原理與實作

cgroup○ blkio — this subsystem sets limits on input/output access to and from block devices such as

physical drives (disk, solid state, USB, etc.).○ cpu — this subsystem uses the scheduler to provide cgroup tasks access to the CPU.○ cpuacct — this subsystem generates automatic reports on CPU resources used by tasks in a

cgroup.○ cpuset — this subsystem assigns individual CPUs (on a multicore system) and memory nodes to

tasks in a cgroup.○ devices — this subsystem allows or denies access to devices by tasks in a cgroup.○ freezer — this subsystem suspends or resumes tasks in a cgroup.○ memory — this subsystem sets limits on memory use by tasks in a cgroup, and generates

automatic reports on memory resources used by those tasks.○ net_cls — this subsystem tags network packets with a class identifier (classid) that allows the

Linux traffic controller (tc) to identify packets originating from a particular cgroup task.○ net_prio — this subsystem provides a way to dynamically set the priority of network traffic per

network interface.○ ns — the namespace subsystem.

Page 17: Docker 原理與實作

cgroup freezer

● The cgroup freezer is useful to batch job management system which startand stop sets of tasks in order to schedule the resources of a machineaccording to the desires of a system administrator.

Page 18: Docker 原理與實作

$ mount -t cgroup -ofreezer freezer /<path>/freezer

/<path>/freezer:root cgroup

tasks otherfile my

/<path>/freezer/my:sub cgroup

tasks otherfile

$ mkdir /<path>/freezer/my

all process

pid

Page 19: Docker 原理與實作

cgroup freezer

$ mount -t cgroup -ofreezer freezer /<path>/freezer$ ch /<path>/freezer/; ls cgroup.clone_children cgroup.event_control cgroup.procs cgroup.sane_behavior notify_on_release release_agent tasks

1. mkdir my_group;cd mygroup2. echo $some_pid > tasks3. echo FROZEN > freezer.state4. echo THAWED > freezer.state

Page 20: Docker 原理與實作

other cgroup

● memory cgroup:○ limit process memoroy usage.○ show various statistics

● blkio cgroup:○ change widget○ show various statistics

Page 21: Docker 原理與實作

lxc

● LXC is a userspace interface for the Linux kernel containment features.

● Container templates● A set of standard tools to control the

containers

Page 22: Docker 原理與實作

lxchost os

container A

process 1

process 2

container B

process 3

process 4

process x

A can see BA B A BA can see B.B can see A.

Page 23: Docker 原理與實作

lxc

1. lxc-create -n test-container -t ubuntu2. lxc-ls --fancy3. lxc-start -n test-container4. lxc-console -n test-container5. lxc-stop -n test-container6. lxc-destroy -n test-container

Page 24: Docker 原理與實作

start vs execute

● start:○ boot linux system

● execute:○ execute program directly○ make sure you have "/usr/lib/lxc/lxc-init" in your

container

Page 25: Docker 原理與實作

sudo lxc-checkpoint -name p1 --statefile a● output:

○ lxc-checkpoint: 'checkpoint' function not implemented

Page 26: Docker 原理與實作

linux aufs

● It allows files and directories of separate filesystem to co-exist under a single directories.

/tmp/union

/tmp/a /tmp/b /tmp/c

Page 27: Docker 原理與實作

# apt-get install aufs-tools

# mount -t aufs -o br=/tmp/a:/tmp/b none /tmp/union/

# mount -t aufs -o br=/tmp/a=rw:/tmp/b=rw none /tmp/union

Page 28: Docker 原理與實作

docker vs lxc

● docker is based on lxc● docker can create image from text file.● docker seldom boot system.● docker provide user-friendly interface● docker use less disk space.(aufs)

Page 29: Docker 原理與實作

dockerrunning containers

process

rootfs

stopped containers

rootfs

image

commit

r

un

st

op

st

ar

t

rootfs

Page 30: Docker 原理與實作

rootfs in container

image: rw

ZZZ image: ro

XXX image: ro

ubuntu image: ro

rootfs in image

image: ro

ZZZ image: ro

XXX image: ro

ubuntu image: ro

aufs

aufs

Page 31: Docker 原理與實作

taiwan.py site dockerfile

FROM ubuntu:12.10

RUN apt-get update

RUN apt-get install -y python-dev

RUN apt-get install -y python-pip

RUN apt-get install -y git

RUN pip install mynt

RUN git clone https://github.com/lucemia/taiwan.py

RUN mynt gen -f taiwan.py/src/ taiwan.py/build/

EXPOSE 8000

CMD cd taiwan.py/build/ && python -m SimpleHTTPServer

Page 32: Docker 原理與實作

How to run

1. cat dockerfile | sudo docker build -t taiwanpy -

2. docker run -p 8000:9000 taiwanpy3. docker stop xxx4. docker start xxx5. docker stop xxx6. docker rm xxx7. docker rmi taiwanpy

Page 33: Docker 原理與實作

simple docker shell

● https://github.com/ya790206/misc_tools/tree/master/docker_wrapper

Page 34: Docker 原理與實作

Summary

● Namespace for virtualization.● Cgroup for controlling a group of process.● Conatiner and host system use the same

kernel.● Docker is similar to lxc. But docker is easy

to use.

Page 35: Docker 原理與實作

Question

Page 36: Docker 原理與實作

Thank you

Page 39: Docker 原理與實作

參考書目

● Linux Kernel Hacks:改善效能、提昇開發效率及節能的技巧與工具