Top Banner
Andrey Vagin <[email protected]> CRIU - Checkpoint/Restore in User-space FOSDEM 2015
18
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FOSDEM2015: Live migration for containers is around the corner

Andrey Vagin <[email protected]>

CRIU - Checkpoint/Restore in User-space

FOSDEM 2015

Page 2: FOSDEM2015: Live migration for containers is around the corner

2

Agenda

● History

● Under the hood

● Online (iterative) migration

● P.Haul

● How to integrate with/into CRIU

● Kernel impact

● Questions

Page 3: FOSDEM2015: Live migration for containers is around the corner

3

History

● OpenVZ (2005)

– OpenVZ kernel

● Linux Checkpoint/Restart by Oren Laadan (2008)

– A non-mainline kernel

● CRIU (2011)

OpenVZ2005

Linux C/R2008

CRIU2011

Page 4: FOSDEM2015: Live migration for containers is around the corner

5

How does this work?

Kernel objects Process tree

crtools

Image files

Name-spaces

Files

Sockets

Pipes

001101101010110001011010000011010101

001101101010110001011010000011010101

001101101010110001011010000011010101

001101101010110001011010000011010101

001101101010110001011010000011010101

001101101010110001011010000011010101

Page 5: FOSDEM2015: Live migration for containers is around the corner

6

Dump

● Parasite code

– Receive file descriptors

– Dump memory content

– prctl(), sigaction, pending signals, timers, etc.

● Ptrace

– Freeze processes

– Inject a parasite code

● Netlink

– Get information about sockets, netns

● Procfs

/proc/PID/maps, /proc/PID/map_files/, /proc/PID/status, /proc/PID/mountinfo

Page 6: FOSDEM2015: Live migration for containers is around the corner

7

Restore

● Collect shared objects

● Restore name-spaces

● Create a process tree

– Restore SID, PGID

– Restore objects, which should be inherited

● Files, sockets, pipes, ...

● Restore per-task properties.

● Restore memory

● Sim! Sala bim!

● Awesome

Namespaces

Processes

Page 7: FOSDEM2015: Live migration for containers is around the corner

9

sigreturn()

User mode Kernel mode

Normalprogramflow

do_signal()handle_signal()

setup_frame()

Signalhandler

Return codeon the stack

system_call()sys_sigreturn()

restore_sigcontext()

Page 8: FOSDEM2015: Live migration for containers is around the corner

10

Why to integrate with CRIU

● RPC protocol and a shared library

● Plugins

● Action scripts

– lock, unlock network

– setup name-spaces

– etc

● External resources

– Bind-mounts of host file systems

– One end of a socket pair or a tty pair

– etc

How to integrate with CRIU

Page 9: FOSDEM2015: Live migration for containers is around the corner

11

Who is CRIU user?

Page 10: FOSDEM2015: Live migration for containers is around the corner

14

Online migration

● Suspend the container on the source host

● Transfer images on the remote hostA size of images can be quite big, so the downtime is too long.

● Resume the container on the target host

● Transfer memory in a few iterations without freezing processes

● Freeze processes only on the last iteration

Page 11: FOSDEM2015: Live migration for containers is around the corner

15

Transfer memory iteratively

Whole memory

Dirty memory

Freeze

Page 12: FOSDEM2015: Live migration for containers is around the corner

16

P.Haul (process hauler) - Live migration using CRIU

Live migration using CRIU

● Iterative

● Optimal

● Customizable

#./p.haul ovz 100 10.30.25.213

Migration succeededtotal time is ~2.86 secfrozen time is ~1.99 sec

( ['0.27', '0.18', '1.55'] )restore time is ~0.86 secimg sync time is ~0.32 sec

Page 13: FOSDEM2015: Live migration for containers is around the corner

17

P.Haul (Process Hauler)

Pre-dump and sync FS

Freeze, dump, sync FS

Restore

Kill

Resume

post-dump

Resume

rollback

clean up

source host destination host

Page 14: FOSDEM2015: Live migration for containers is around the corner

18

New features in the kernel

● Parasite code injection (by Tejun Heo)

– Read task states, that are currently retrieved by a task only about itself

● The kcmp() system call

– Helps checking which kernel objects are shared between processes

● Proc map_files directory

– Find out what exact file is mapped

– Mappings sharing info

● A bunch of prctl extensions

– Set various private stuff on task/mm objects (c/r-only feature)

● Last-pid sysctl

– Restore task with desired PID value

Page 15: FOSDEM2015: Live migration for containers is around the corner

19

New features in a kernel

● Sockets information dumping via netlink (sock_diag)

– Extendable sockets state retrieving engine

● TCP repair mode

– Read intimate state of a TCP connectionand reconstructs it from scratch on a freshly created socket

● Virtual net devices indexes

– Allows to restore network devices in a namespace

● Socket peeking offset

– Allows peeking sockets queues (reading without removing data from queue)

● Task memory tracking

– incremental snapshots, online migration

Page 16: FOSDEM2015: Live migration for containers is around the corner

20

Community

Page 17: FOSDEM2015: Live migration for containers is around the corner

21

In a Nutshell, CRIU...

.... has had 5046 commits made by 44 contributorsrepresenting 85414 lines of code

... is mostly written in Cwith a very low number of source code comments

... has a young, but established codebasemaintained by a large development teamwith stable Y-O-Y commits

... took an estimated 17 years of effort (COCOMO model)starting with its first commit in September, 2011

https://www.ohloh.net/p/criu#

Page 18: FOSDEM2015: Live migration for containers is around the corner

Thank you

http://criu.orghttps://plus.google.com/[email protected]