Jul 18, 2015
2
Agenda
● History
● Under the hood
● Online (iterative) migration
● P.Haul
● How to integrate with/into CRIU
● Kernel impact
● Questions
3
History
● OpenVZ (2005)
– OpenVZ kernel
● Linux Checkpoint/Restart by Oren Laadan (2008)
– A non-mainline kernel
● CRIU (2011)
OpenVZ2005
Linux C/R2008
CRIU2011
5
How does this work?
Kernel objects Process tree
crtools
Image files
Name-spaces
Files
Sockets
Pipes
001101101010110001011010000011010101
001101101010110001011010000011010101
001101101010110001011010000011010101
001101101010110001011010000011010101
001101101010110001011010000011010101
001101101010110001011010000011010101
6
Dump
● Parasite code
– Receive file descriptors
– Dump memory content
– prctl(), sigaction, pending signals, timers, etc.
● Ptrace
– Freeze processes
– Inject a parasite code
● Netlink
– Get information about sockets, netns
● Procfs
/proc/PID/maps, /proc/PID/map_files/, /proc/PID/status, /proc/PID/mountinfo
7
Restore
● Collect shared objects
● Restore name-spaces
● Create a process tree
– Restore SID, PGID
– Restore objects, which should be inherited
● Files, sockets, pipes, ...
● Restore per-task properties.
● Restore memory
● Sim! Sala bim!
● Awesome
Namespaces
Processes
9
sigreturn()
User mode Kernel mode
Normalprogramflow
do_signal()handle_signal()
setup_frame()
Signalhandler
Return codeon the stack
system_call()sys_sigreturn()
restore_sigcontext()
10
Why to integrate with CRIU
● RPC protocol and a shared library
● Plugins
● Action scripts
– lock, unlock network
– setup name-spaces
– etc
● External resources
– Bind-mounts of host file systems
– One end of a socket pair or a tty pair
– etc
How to integrate with CRIU
11
Who is CRIU user?
14
Online migration
● Suspend the container on the source host
● Transfer images on the remote hostA size of images can be quite big, so the downtime is too long.
● Resume the container on the target host
● Transfer memory in a few iterations without freezing processes
● Freeze processes only on the last iteration
15
Transfer memory iteratively
Whole memory
Dirty memory
Freeze
16
P.Haul (process hauler) - Live migration using CRIU
Live migration using CRIU
● Iterative
● Optimal
● Customizable
#./p.haul ovz 100 10.30.25.213
Migration succeededtotal time is ~2.86 secfrozen time is ~1.99 sec
( ['0.27', '0.18', '1.55'] )restore time is ~0.86 secimg sync time is ~0.32 sec
17
P.Haul (Process Hauler)
Pre-dump and sync FS
Freeze, dump, sync FS
Restore
Kill
Resume
post-dump
Resume
rollback
clean up
source host destination host
18
New features in the kernel
● Parasite code injection (by Tejun Heo)
– Read task states, that are currently retrieved by a task only about itself
● The kcmp() system call
– Helps checking which kernel objects are shared between processes
● Proc map_files directory
– Find out what exact file is mapped
– Mappings sharing info
● A bunch of prctl extensions
– Set various private stuff on task/mm objects (c/r-only feature)
● Last-pid sysctl
– Restore task with desired PID value
19
New features in a kernel
● Sockets information dumping via netlink (sock_diag)
– Extendable sockets state retrieving engine
● TCP repair mode
– Read intimate state of a TCP connectionand reconstructs it from scratch on a freshly created socket
● Virtual net devices indexes
– Allows to restore network devices in a namespace
● Socket peeking offset
– Allows peeking sockets queues (reading without removing data from queue)
● Task memory tracking
– incremental snapshots, online migration
20
Community
21
In a Nutshell, CRIU...
.... has had 5046 commits made by 44 contributorsrepresenting 85414 lines of code
... is mostly written in Cwith a very low number of source code comments
... has a young, but established codebasemaintained by a large development teamwith stable Y-O-Y commits
... took an estimated 17 years of effort (COCOMO model)starting with its first commit in September, 2011
https://www.ohloh.net/p/criu#
Thank you
http://criu.orghttps://plus.google.com/[email protected]