Top Banner
Helping Users Maximize VM Performance Martin Polednik (@mpolednik) Software Engineer @ Red Hat
37

Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Jun 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Helping Users Maximize VM Performance

Martin Polednik (@mpolednik) Software Engineer @ Red Hat

Page 2: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating
Page 3: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

The Data• oVirt databases from sosreports

• ~ 40,000 virtual machine (VM) definitions

• ~ 700 clusters*

• ~ 2,200 hosts

• ~ 60,000 disks

* oVirt specific entity that consists of hosts, VMs, disks, networks etc. Consider it a scheduling domain.

Page 4: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Machine Types

2%21%

77%

pc-i440fx-rhel7.2.0pc-i440fx-rhel7.3.0rhel6.5.0

• clusters "group" VMs by machine type

• updating to a newer cluster is a nontrivial process

Page 5: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

VM 0HOST 0

NUMA• soft violation: VM does not fit within some of the host's NUMA nodes

• example: VM 0:NODE 0 doesn't fit within HOST 0:NODE 1

• could be solved by pinning

NODE 1 32 GiB

NODE 0 64 GiB

NODE 0 48 GiB

Page 6: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Soft NUMA Violations• 17.01 % of VM definitions

• the query considered scheduling domains (clusters)

• "there exists a host in the cluster whose NUMA node is smaller than the NUMA node of the VM"

• worst case in cluster AND host scheduling

Page 7: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

VM 0HOST 0

NUMA• hard violation: VM does not fit within any of the host's NUMA nodes

• example: VM 0:NODE 0 doesn't fit within HOST 0:NODE 0 or HOST 0:NODE 1

NODE 1 32 GiB

NODE 0 32 GiB

NODE 0 48 GiB

Page 8: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Hard NUMA Violations• 9.74 % of VM definitions

• scheduling domains were considered

• "there exists a host in the cluster whose NUMA nodes are smaller than the NUMA node of the VM"

• worst case in cluster scheduling

Page 9: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Solution

• warn the user about suboptimal NUMA topology

• easy to determine on the cluster level

• important for specific applications (huge DBs)

• future: create the nodes automatically?

Page 10: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

NUMA & CPU pinning• low adoption, why?

• no migration (disabled at management level)

• HA is hard, breaks cluster logic (only HA between subset of hosts)

• limited scheduling (pin to host)

• can we change that?

Page 11: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

NUMA & CPU pinning• host-passthrough CPU (aka copy features)

• automatically pin CPUs

• e.g. 4 NUMA nodes, 12 CPUs per node

• node CPU0, CPU1 ~> "service" CPUs (emulation thread, IO thread, virt daemons)

• CPU2 through CPU11 ~> compute CPUs

• if #vCPU > 10, ask the user to add a virtual node

• easy to think about RT too!

Page 12: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Hugepages

• platform default + extended sizes

• either preallocated or dynamically allocated

• at least for x86_64 1 GiB (pdpe1gb) preferred, other sizes configurable

• THP is hit or miss performance-wise

Page 13: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Hugepages

• no cluster-level overcommit

• no memory hot(un)plug, limited migration (management layer constraints)

• "hard" resource limit

• NUMA-aware allocation

Page 14: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Hugepages Allocation

• could cause VM start delays

• opt-out at the host level, disabled in scheduler

• reserved hugepages concept (DPDK etc.)

• max(vm_hugepages - free_hugepages, 0)

Page 15: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

L3 cache• https://git.qemu.org/?p=qemu.git;a=commit;h=14c985cffa6cb177fc01a163d8bcf227c104718c

• QEMU: -cpu foo,l3-cache=on

• libvirt: <cpu><cache level='3' mode='emulate'/></cpu>

• less inter-processor interrupts (IPIs) -> less VMEXITs

• essential for SAP workloads

Page 16: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Disk Interface• choice between IDE, VirtIO-blk, VirtIO-SCSI (+

passthrough)

• 3.6, 4.0 defaults to VirtIO-blk, 4.1+ to VirtIO-SCSI

• VirtIO-SCSI controller by default in VMs (hotplug capability) :(

• TRIM is important to people!

Page 17: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Disk Interface

disk

s

0%

25%

50%

75%

100%

cluster version3.6 4.0 4.1

IDEVirtIO (blk)VirtIO (scsi)

Page 18: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

IO Threads

• 3.6, 4.0, 4.1 allow specifying # of IO threads

• no hints about which number to use

Page 19: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

IO Threads

VMs

0.0%

0.4%

0.8%

1.2%

IO threads

1 2 3 4 10 16

Page 20: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

IO Threads• testing has shown the "sweet spot" to be 1 IO

thread

• therefore, oVirt no longer (easily) allows arbitrary numbers

• override via hooks

• https://mpolednik.github.io/2017/01/23/virtio-blk-vs-virtio-scsi/

Page 21: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

VirtIO RNG

• "low hanging fruit"

• improves virtually any operation that uses PRNG (e.g. OS installation, GPG key generation)

• optional in 3.6, 4.0, default in 4.1 - no downsides?

Page 22: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

VirtIO RNG perf

rngtest (sec)0 45 90

virtio-rngno virtio-rng

Page 23: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

VirtIO RNG

VMs

0%

4%

8%

12%

16%

cluster version3.6 4.0 4.1

Page 24: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Host Devices• using real hardware to accelerate the VMs

• GPUs, NICs, NVMe disks

• reduced CPU load

• should still honor NUMA locality

• hard resource limit

Page 25: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Host DevicesHOST 0

NODE 0 32 GiB

GPU 0GPU 1 NODE 0

32 GiB

GPU 2GPU 3

Page 26: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Host Devices

• easy to tune numa automatically for simple case (all host devices within single numa node)

• more complicated if host devices origin from multiple NUMA nodes

Page 27: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Network

1%

99%

pc-i440fx-rhel7.2.0VirtIOSR-IOVe1000

• VirtIO is the preferred "flexibility" choice

• SR-IOV for performance/NFV, migration enabled

• emulated NICs for compatibility

• looks good as it is

Page 28: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Migration Performance• relevant for clusters

• maximum downtime incremented in steps

• limit number of inbound/outbound migrations to avoid oversaturated network

• post copy - needs to be enabled explicitly, success chance dependent on user's network

• don't expect high bandwidth, redundant network in every case

Page 29: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Migration PerformanceLegacy Minimal

downtimeSuspend workload if needed

Post copy

20 GiB RAM Failed After 12 min

41 min 31 sec

31 min 42 sec 25 min

20 GiB RAM, 50 msec latency

Failed After 17min

47 min 24 sec

1 h 12 min 31 sec

48 min 10 sec

40 VM,1 GiB RAM

AVG: 1 min 40 sec

AVG: 1 min 50 sec AVG: 4 min AVG: 1 min

30 sec

Page 30: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

KSM

• hugetlbfs not scanned by ksmd

• no overcommit for VMs that are considered high performance

• waste of CPU cycles?

Page 31: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Devices• graphics, video, USBs, smartcard, watchdog,

balloon

• do we need them?

• no known (to us) performance effects

• removing them shouldn't hurt

• no data though

Page 32: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Devices• some functionality tradeoffs (ballon and memory

hot(unplug) in the future)

• running headless

• no graphics

• no video

• no spice/vnc, just console connectivity

• console proxy to connect to the guests

Page 33: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Implementation• do as many "safe" tweaks as possible

• with a single NUMA node, go for device locality

• warn about suboptimal configuration

• NUMA violation => suggest a vNODE

• inform about tradeoffs

• VirtIO-blk vs VirtIO-SCSI

• allow user to override as many tunes as possible!

Page 34: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Benchmarks

• synthetic benchmarks show 0-15 % performance improvement

• pgbench ~ 10 % improvement

• pts/enclode-flac ~ 0.1 % improvement

• more data in the future as reports come in

Page 35: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Summary• align everything with NUMA topology

• suggest pinning where possible (incl. IO thread, emulator thread)

• suggest hugepages

• expose l3 cache

• VirtIO-RNG

• host devices (hardware) > VirtIO > emulation

• remove unneeded devices

Page 36: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Summary

• benchmark your workload and tune accordingly!

Page 37: Helping Users Maximize VM Performance · 2017-12-22 · Machine Types 2% 21% 77% pc-i440fx-rhel7.2.0 pc-i440fx-rhel7.3.0 rhel6.5.0 • clusters "group" VMs by machine type • updating

Questions?Thank you!

Slides & Blog @ https://mpolednik.github.io/