Top Banner
Recent advances in the Linux kernel resource management Kir Kolyshkin, OpenVZ [email protected]
32

Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ [email protected] Agenda Resources to account and control Some background on

May 08, 2018

Download

Documents

vucong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Recent advancesin the Linux kernel 

resource management

Kir Kolyshkin, [email protected]

Page 2: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Agenda● Resources to account and control● Some background on containers● Existing functionality, shortcomings● Control Groups a.k.a. cgroups● Memory Controller● Future work

Page 3: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Resources

Page 4: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Why?● All resources are finite● Multiple tasks and users● Need usage statistics / bookkeeping● Need Denial of Service protection● Need Quality of Service level

(not only limits but guarantees)

Page 5: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

What?● CPU● Memory (RAM)● Swap● Disk space● Disk I/O● Network

Page 6: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Resources: CPUCPU is given to tasks in time slices

● CPU shares/weights● CPU limits● for SMP: CPU affinity

Page 7: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Resources: Memory & swap● User memory

– Virtual and physical (RSS) memory– Dirty page cache

● Kernel memory– Various objects, different allocators– Special case: network buffers

● Swap space

Page 8: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Resources: disk● Disk space● Disk I/O bandwidth

– read/write– mmap()– swapin/swapout– Problem: most of I/O is async

Page 9: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Resources: networking● Network bandwidth: solved by tc● Traffic Control:

– Shaping– Scheduling– Policies– Dropping

Page 10: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Containers

Page 11: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

What are containers?● Multiple isolated userspace

instances● Running on top of a single kernel● Like VMs but very lightweight,

native performance, low overhead

Page 12: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Containers Implementations● OpenVZ● Parallels Virtuozzo Containers● FreeBSD jails● Linux-VServer● Solaris 10 Containers/Zones● IBM AIX6 WPARs

(Workload Partitions)

Page 13: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Containers cont'd● Multiple containers should

peacefully co-exist, need DoS protection

● From the resource management point of view, containers are just groups of processes.

Page 14: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Existing mechanisms

Page 15: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Disk Quota● Per mount point disk quota

for users and groups● Soft limits, hard limits, grace periods● Can see the current usage● Can be inc'd/dec'd on-the-fly● Applications are expecting disk

space shortage (or at least should be)

Page 16: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

CPU● Per-process nice value which can be

changed on-the-fly (nice, renice)● Real-time priority queue● Hard CPU time limit (ulimit -c)

Page 17: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

ulimit● setrlimit()/getrlimit() syscalls● Controls 16 different resources:

core file size, data seg size, scheduling priority, file size, pending signals,max locked memory, max memory size, number of open files, pipe size,POSIX message queues, real-time priority, stack size, cpu time, max user processes,virtual memory, file locks

● Soft limits and hard limits

Page 18: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

ulimit: problems● Not all resources are covered● Ulimits set in the current context

– the only good place to set is login– some can only be decreased run-time

● All limits are per-process– only NPROC is per-UID

● Current usage values are unknown● Memory limits are mostly ignored

Page 19: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Control Groups

Page 20: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Control Groups● A generic mechanism for grouping

tasks into hierarchical groups● Multiple resource controllers● Possible to have different groups for

different controllers● Managed via cgroup filesystem

Page 21: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Control Groups: interfaceManaged via cgroup filesystem:mkdir /dev/cgroupmount -t cgroup none /dev/cgroupmkdir /dev/cgroup/0cd /dev/cgroup/0echo $$ > taskscat /proc/self/cgroup/etc/init.d/httpd start

Page 22: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Control Groups: history● A feature known as cpusets was

developed by big iron Bull/SGI guys● Used to maintain process groups to

NUMA nodes affinity● Paul Menage generalized it● Now cpusets is just one of the

resource controllers

Page 23: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Memory Controller

Page 24: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Memory controller● User memory:

– RSS– Page cache

● Reclamation– Same as try_to_free_pages()

● OOM killer

Page 25: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

User Memory

VMAs classification

● unreclaimable:private and anonymous

● reclaimable:shared file mappings

Unused pages Used pages Unreclaimable VMAsReclaimable VMAs

“Lengths of mappings” resource

“RSS” resource

Pages classification

● unused:parts of mapped regions

● used:touched pages

Task address space

Page 26: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

MemCtrl: interface# echo 4M > memory.limit_in_bytes# cat memory.limit_in_bytes4194304# cat memory.usage_in_bytes172032# cat memory.max_usage_in_bytes294912# cat memory.failcnt0# cat memory.stat....

Page 27: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Shared Pages accounting● Shared code/library segments● Approaches:

– Charge to the first user only (unfair)– Charge to all users (incorrect totals)– Charge a fraction to every user

Page 28: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Page fractions accounting

C1

C2

C3C4

½¼

¼¼

¼

Algorithm benefits● O(1) algorithm of

adding and removing

● The sum of RSS on all beancounters is an amount of all actually used pages

Page 29: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Future

Page 30: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Future a.k.a. TODO● Shared pages accounting● VMA (user mappings) length ctrl● Kernel memory controller● cgroups checkpoint/restart● per-cgroup I/O priorities● All that is available in OpenVZ;

needs to be ported to mainstream

Page 31: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

More Info

/usr/src/linux/Documentation/cgroups/*

/usr/src/linux/Documentation/controllers/*

[email protected]

Page 32: Recent advances in the Linux kernel resource … the Linux kernel resource management Kir Kolyshkin, OpenVZ kir@openvz.org Agenda Resources to account and control Some background on

Questions? Comments?

[email protected]

Booth #63