Top Banner
Andrey Vagin <[email protected]> < CRIU - Checkpoint/Restore in User-space
20

Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

Jul 14, 2015

Download

Documents

Andrey Wagin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

Andrey Vagin <[email protected]><

CRIU - Checkpoint/Restore in User-space

Page 2: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

2

Agenda

● CRIU and use-cases

● History

● Current state

● Under the hood

● Kernel impact

● How to integrate with/into CRIU

● P.haul

● Questions

Page 3: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

3

History

● Berkeley Lab Checkpoint/Restart (BLCR) (2003)

– Load a kernel module and link with a library

● DMTCP: Distributed MultiThreaded CheckPointing (2004-2006)

– Preload a library

● OpenVZ (2005)

– OpenVZ kernel

● Linux Checkpoint/Restart by Oren Laadan (2008)

– A non-mainline kernel

● CRIU (2011)

OpenVZ2005

BLCR2003

Linux C/R2008

CRIU2011

DMTCP2007

Page 4: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

4

What is C/R and how can it be used?

C/R is the ability to save states of processesand to restore them later.

Usage scenarios:

– Failure recovery

– Live migration

– RKU (seamless kernel update)

– Rollback to the previous state

– Speed up of slow-boot services

– HPC issues

Page 5: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

5

Who is CRIU user?

Page 6: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

6

How does this work?

Kernel objects Process tree

crtools

Image files

Name-spaces

Files

Sockets

Pipes

001101101010110001011010000011010101

001101101010110001011010000011010101

001101101010110001011010000011010101

001101101010110001011010000011010101

001101101010110001011010000011010101

001101101010110001011010000011010101

Page 7: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

8

Dump

● Parasite code

– Receive file descriptors

– Dump memory content

– Prctl(), sigaction, pending signals, timers, etc.

● Ptrace

– freeze processes

– Inject a parasite code

● Netlink

– Get information about sockets, netns

● Procfs

/proc/PID/maps, /proc/PID/map_files/, /proc/PID/status, /proc/PID/mountinfo

Page 8: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

10

Restore

● Collect shared objects

● Restore name-spaces

● Create a process tree

– Restore SID, PGID

– Restore objects, which should be inherited

● Files, sockets, pipes, ...

● Restore per-task properties.

● Restore memory

● Sim! Sala bim!

● Awesome

Namespaces

Processes

Page 9: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

11

sigreturn()

Page 10: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

12

New features in a kernel

● Parasite code injection (by Tejun Heo)

– Read task states, that are currently retrieved by a task only about itself

● The kcmp() system call

– Helps checking which kernel objects are shared between processes

● Proc map_files directory

– Find out what exact file is mapped

– Mappings sharing info

● A bunch of prctl extensions

– Set various private stuff on task/mm objects (c/r-only feature)

● Last-pid sysctl

– Restore task with desired PID value

Page 11: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

13

New features in a kernel

● Sockets information dumping via netlink (sock_diag)

– Extendable sockets state retrieving engine

● TCP repair mode

– Read intimate state of a TCP connectionand reconstructs it from scratch on a freshly created socket

● Virtual net devices indexes

– Allows to restore network devices in a namespace

● Socket peeking offset

– Allows peeking sockets queues (reading without removing data from queue)

● Task memory tracking

– incremental snapshots, online migration

Page 12: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

14

How to integrate with CRIU

● Action scripts

– block/unblock network

– setup namespaces

– post-dump and post-restore

● RPC, shared library

● Plugins

Page 13: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

15

RPC and libcriu.so

● Easy to use from other languages

– The protocol is based on protobuf messages

● Allow to use CRIU for unprivileged processes

– CRIU still requires root privileges to run

– UNIX domain sockets support passing credentials

● Self-dump

– A process can request to dump itself

Page 14: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

16

Plugins

● Unknown file types

● External dependencies

– Unix sockets (dbus, journald, rsyslog, etc)

– Unknown character and block devices.

– External bind-mounts

– External net devices

– External something else

Page 15: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

17

Community

Page 16: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

18

In a Nutshell, CRIU...

.... has had 4,375 commits made by 36 contributorsrepresenting 58,688 lines of code

... is mostly written in Cwith a very low number of source code comments

... has a young, but established codebasemaintained by a large development teamwith stable Y-O-Y commits

... estimated cost $ 787,432

https://www.ohloh.net/p/criu#

Page 17: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

19

Where is CRIU now?

Page 18: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

20

P.haul (process hauler) - Live migration using CRIU

Live migration using CRIU

● Iterative

● Optimal

● Customizable

#./p.haul ovz 100 10.30.25.213

Migration succeededtotal time is ~2.86 secfrozen time is ~1.99 sec

( ['0.27', '0.18', '1.55'] )restore time is ~0.86 secimg sync time is ~0.32 sec

Page 19: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

21

P.haul

Pre-dump and sync FS

Freeze, dump, sync FS

Restore

Kill

Resume

Post-dump

Resume

Page 20: Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ask)

Thank you

http://criu.org