Top Banner
Kunpeng BoostKit for Virtualization Tuning Guide Issue 11 Date 2021-10-15 HUAWEI TECHNOLOGIES CO., LTD.
72

Tuning Guide - HUAWEI CLOUD

Apr 22, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tuning Guide - HUAWEI CLOUD

Kunpeng BoostKit for Virtualization

Tuning Guide

Issue 11

Date 2021-10-15

HUAWEI TECHNOLOGIES CO., LTD.

Page 2: Tuning Guide - HUAWEI CLOUD

Copyright © Huawei Technologies Co., Ltd. 2022. All rights reserved.

No part of this document may be reproduced or transmitted in any form or by any means without priorwritten consent of Huawei Technologies Co., Ltd. Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respectiveholders. NoticeThe purchased products, services and features are stipulated by the contract made between Huawei andthe customer. All or part of the products, services and features described in this document may not bewithin the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,information, and recommendations in this document are provided "AS IS" without warranties, guaranteesor representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in thepreparation of this document to ensure accuracy of the contents, but all statements, information, andrecommendations in this document do not constitute a warranty of any kind, express or implied.

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. i

Page 3: Tuning Guide - HUAWEI CLOUD

Contents

1 Docker Tuning Guide.............................................................................................................. 11.1 Introduction............................................................................................................................................................................... 11.2 Tuning the Hardware............................................................................................................................................................. 31.3 Tuning the OS........................................................................................................................................................................... 41.3.1 NIC Interrupt Affinity.......................................................................................................................................................... 41.4 Tuning a Docker Container.................................................................................................................................................. 61.4.1 NUMA Affinity.......................................................................................................................................................................6

2 KVM Tuning Guide.................................................................................................................. 92.1 Introduction............................................................................................................................................................................... 92.2 Tuning the Hardware........................................................................................................................................................... 112.2.1 Checking Hardware Validity.......................................................................................................................................... 112.2.2 Modifying BIOS Settings................................................................................................................................................. 112.2.3 Tuning the DIMM Configuration..................................................................................................................................162.3 Tuning the OS........................................................................................................................................................................ 162.4 Tuning the Virtualization Settings.................................................................................................................................. 182.4.1 Binding the VM to Cores................................................................................................................................................. 182.4.2 Setting 1:1 Core Binding and Same-Die Memory Access.................................................................................... 202.4.3 Configuring the VM NIC..................................................................................................................................................212.4.4 Using virtio-blk for VM Storage................................................................................................................................... 222.4.5 Using Huge Pages for the VM...................................................................................................................................... 232.5 Using the Automatic Tuning Tool HiKVMPerf............................................................................................................ 262.5.1 Application Scenario......................................................................................................................................................... 262.5.2 Usage Guidelines............................................................................................................................................................... 27

3 OpenStack Tuning Guide (CentOS 7.6)........................................................................... 313.1 Introduction............................................................................................................................................................................ 313.1.1 OpenStack Overview........................................................................................................................................................ 313.1.2 Environment........................................................................................................................................................................ 323.1.3 Tuning Guidelines.............................................................................................................................................................. 343.1.4 Tuning Process Flow......................................................................................................................................................... 343.2 Tuning Hardware.................................................................................................................................................................. 353.2.1 Modifying BIOS Settings................................................................................................................................................. 353.2.2 Configuring the Memory................................................................................................................................................ 42

Kunpeng BoostKit for VirtualizationTuning Guide Contents

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. ii

Page 4: Tuning Guide - HUAWEI CLOUD

3.3 OS Tuning................................................................................................................................................................................ 423.3.1 Checking the OS and Kernel..........................................................................................................................................433.3.2 Binding Service Processes to Cores..............................................................................................................................453.3.3 Configuring the NIC..........................................................................................................................................................463.4 VM Tuning............................................................................................................................................................................... 473.4.1 Binding the VMs to Cores............................................................................................................................................... 473.4.2 Configuring the NUMA Affinity.................................................................................................................................... 513.4.3 Configuring the Memory Huge Page Function....................................................................................................... 533.4.4 Upgrading GCC...................................................................................................................................................................55

4 Kunpeng BoostKit for Virtualization Performance Tuning Guide............................. 584.1 Introduction............................................................................................................................................................................ 584.2 CentOS 7.6 Tuning................................................................................................................................................................ 594.2.1 Installing QEMU.................................................................................................................................................................594.2.2 Dual-layer Scheduling......................................................................................................................................................604.3 Low-Load V-Turbo Tuning.................................................................................................................................................. 614.4 Tuning in Resource Fragmentation Scenarios.............................................................................................................624.4.1 Enabling Memory Interleaving..................................................................................................................................... 624.4.2 Configuring Guest NUMA...............................................................................................................................................644.5 Change History...................................................................................................................................................................... 66

A Change History...................................................................................................................... 67

Kunpeng BoostKit for VirtualizationTuning Guide Contents

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. iii

Page 5: Tuning Guide - HUAWEI CLOUD

1 Docker Tuning Guide

1.1 Introduction

1.2 Tuning the Hardware

1.3 Tuning the OS

1.4 Tuning a Docker Container

1.1 Introduction

Docker Container ArchitectureDocker uses the client-server architecture, as shown in Figure 1-1. The Dockerclient communicates with the Docker daemon, which is responsible for building,running, and distributing Docker containers. The Docker client and Dockerdaemon can run in the same system. You can also connect the Docker client to aremote Docker daemon. The Docker client and Docker daemon communicate witheach other by using REST APIs, UNIX sockets, or network interfaces.

Figure 1-1 Docker container architecture

Kunpeng BoostKit for VirtualizationTuning Guide 1 Docker Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 1

Page 6: Tuning Guide - HUAWEI CLOUD

Tuning Process FlowThis document describes how to adjust the hardware parameters, OS, and Dockercontainer settings on a TaiShan server to achieve optimal Docker containerperformance.

NO TE

Install Docker first. For details, see Docker Installation Guide.

Figure 1-2 shows the process of performance tuning on the Docker virtualizationplatform.

Figure 1-2 Docker container performance tuning

Top N Tuning ItemsThe following table lists the most frequently used tuning items that affect orimprove the Docker container performance. You can select proper tuning itemsbased on your requirements to achieve optimal performance.

TuningItem

Description ApplicationScenario

Remarks

BIOS Set the memoryrefresh rate toAuto.

For commercialuse

Change the memoryrefresh rate to Auto toimprove memorybandwidthperformance.

NUMAaffinity

Ensure that theCPUs and memorybound to a Dockercontainer are in thesame physical nodeto prevent cross-dieand cross-chipmemory access.

For commercialuse

Significantly improvesDocker containercomputingperformance.

CPU binding Bind each DockervCPU to a CPU.

For commercialuse

-

Kunpeng BoostKit for VirtualizationTuning Guide 1 Docker Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 2

Page 7: Tuning Guide - HUAWEI CLOUD

TuningItem

Description ApplicationScenario

Remarks

Bind Docker vCPUsto CPUs in a CPUcluster.

For POC testsonly

Significantly improvesDocker containercomputingperformance.

NICinterruptaffinity

Bind each interruptto the CPU of theNUMA node wherethe physical NIC islocated.

For commercialuse

-

1.2 Tuning the Hardware

Checking Hardware Validity

Before installing and deploying the Docker container environment, ensure that:

● The hardware (including servers and NICs) is in the TaiShan servercompatibility list.

● The OS is in the TaiShan server compatibility list. This document uses CentOS7.6 as an example.

● The software (including the BIOS and BMC) is the latest version released onthe Huawei enterprise support website.

Tuning the BIOS Configuration● Purpose

You can set some advanced options in the BIOS to effectively improve theperformance of a Docker container. Table 1-1 describes the recommendedperformance-related BIOS configuration option on TaiShan servers.

Table 1-1 BIOS performance configuration options

BIOS ConfigurationOption

RecommendedValue

Description

Custom Refresh Rate Auto Memory refresh rate. The defaultvalue is 32 ms.Choose Advanced > MemoryConfig > Custom Refresh Rate.

● Method

Step 1 Restart the server, enter the BIOS, and choose Advanced > Memory Config >Custom Refresh Rate.

Kunpeng BoostKit for VirtualizationTuning Guide 1 Docker Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 3

Page 8: Tuning Guide - HUAWEI CLOUD

Step 2 Set Custom Refresh Rate to Auto and press F10 to save the BIOS settings.

----End

Tuning the DIMM ConfigurationUse the Intelligent Computing Product Memory Configuration Assistant toconfigure DIMMs.

NO TE

The DIMM installation method recommended by the Intelligent Computing ProductMemory Configuration Assistant is the optimal configuration. Asymmetric DIMMinstallation will cause frequent cross-NUMA memory access, resulting in performancedeterioration.

1.3 Tuning the OS

1.3.1 NIC Interrupt AffinityWhen a Docker container uses the NIC passthrough function, you can bind NICinterrupts to CPUs to maximize the network performance and avoid networkbottlenecks.

Confirming the NIC UsedStep 1 Check the required NIC.

ifconfig

Step 2 Check the NUMA node where the NIC is located, and bind the NIC interrupt ID tothe CPU of the NUMA node.cat /sys/class/net/XXX/device/numa_node

Kunpeng BoostKit for VirtualizationTuning Guide 1 Docker Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 4

Page 9: Tuning Guide - HUAWEI CLOUD

NO TE

XXX indicates the network port name. For example, the network port enp125s0f0 is inNUMA node 0.

----End

Binding NIC Interrupts to CPUs

Step 1 Disable the irqbalance service.service irqbalance statusservice irqbalance stop

Step 2 Query the interrupt ID corresponding to the NIC.cat /proc/interrupts | grep enp125s0f0

NO TE

enp125s0f0 is the network port name.

Step 3 Query the CPU bound to the current NIC.cat /proc/irq/227/smp_affinity_list

Step 4 Manually bind the NIC interrupt ID to a CPU.

For example, bind the NIC interrupt ID to CPU 4.

echo 4 > /proc/irq/227/smp_affinity_list

Kunpeng BoostKit for VirtualizationTuning Guide 1 Docker Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 5

Page 10: Tuning Guide - HUAWEI CLOUD

NO TICE

You are advised to run the echo 2 > /proc/irq/xxx/smp_affinity_list command tobind the interrupt to the CPU of the NUMA node where the physical NIC islocated. In the command, xxx indicates the interrupt ID queried in Step 2. In thisexample, the interrupt is bound to CPU 2.

----End

1.4 Tuning a Docker Container

1.4.1 NUMA AffinityBy default, containers can access CPUs of a host at any interval. Most users usethe default completely fair scheduler (CFS). When multiple containers run on aserver, the services in different containers may be different. You can bindcontainers to CPUs and configure the NUMA affinity to maximize the containerperformance based on the application scenario.

1:1 CPU Binding and Same-Die Memory AccessvCPUs can be bound to CPUs in the same processor or CPUs in the same NUMAnode. Avoid cross-die and cross-chip memory access of a Docker container toprevent performance deterioration. By default, vCPUs of different containers mayrun on the same physical CPU, which causes CPU resource competition andfrequent VMID changes. As a result, L1 TLB flushing frequently occurs and the TLBmiss rate is high, causing performance deterioration.

Step 1 Query the NUMA information.numactl -H

The preceding figure shows the CPU core distribution of the Kunpeng 920 5250processor. CPU cores 0 to 23 are in NUMA 0, CPU cores 24 to 47 in NUMA 1, CPUcores 48 to 71 in NUMA 2, and CPU cores 72 to 95 in NUMA 3. When bindingDocker container vCPUs to cores, avoid cross-die and cross-chip memory access toprevent performance deterioration.

Kunpeng BoostKit for VirtualizationTuning Guide 1 Docker Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 6

Page 11: Tuning Guide - HUAWEI CLOUD

Step 2 Bind each container vCPU to a CPU and assign memory in the same NUMA nodeto each vCPU.

The Kunpeng 920 5250 processor is used as an example. Create a containernamed 4u8g_01, bind CPUs 4 to 7 of NUMA 0 to the container, allocate 8 GBmemory, use the centos:latest image, mount a local volume, and map /home onthe local host to /home on the container.

docker run -d -it --cpus=4 --cpuset-cpus=4-7 --cpuset-mems=0 -m 8192m --name 4u8g_01 -v /home:/home centos:latest

NO TE

For details about the docker run command, see https://docs.docker.com/engine/reference/commandline/run/.

Command format: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Parameters in the command are described as follows:

● -d: runs the container in the background and prints the container ID.

● -i: enables the STDIN even if no attach operation is performed.

● -t: allocates a pseudo-TTY.

● --cpus: specifies the number of CPUs to be allocated.

● --cpuset-cpus: specifies the CPUs to be bound.

● --cpuset-mems: specifies the NUMA node.

● -m: specifies the memory size to be allocated.

● --name: specifies the Docker container name.

● -v: specifies the volume to be bound.

● centos:latest: specifies the local image. centos indicates the repository, and latestindicates the tag.

----End

Cross-Cluster CPU Binding and Same-Die Memory Access

Kunpeng 920 series processors provides two super CPU clusters (SCCLs). EachSCCL contains six to eight CPU clusters, and each CPU cluster contains four CPUs.When binding CPUs to a Docker container, you are advised to use CPUs acrossmultiple CPU clusters to improve the Docker container performance. This methodcan reduce bandwidth bottlenecks between the L3 cache and memory caused byCPU competition in the same CPU cluster.

Binding vCPUs of a Docker container to CPUs across multiple CPU clusters canbring the following benefits:

● When the load is light, the memory bandwidth can be fully utilized.● Competition for L3 cache tags can be dramatically reduced and the memory

bandwidth and CPU computing performance can be improved.

NO TICE

If the number of clusters is greater than the number of vCPUs of the container tobe created, you can select any clusters.

Kunpeng BoostKit for VirtualizationTuning Guide 1 Docker Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 7

Page 12: Tuning Guide - HUAWEI CLOUD

Step 1 Set cross-cluster CPU binding and same-NUMA memory access.

The Kunpeng 920 5250 processor is used as an example. Create a containernamed 8u16g_02, bind 8 CPU cores (3, 4, 8, 9, 12, 16, 20, and 21) in NUMA 0 tothe container, allocate 16 GB memory to the container, set the image for containercreation to centos:latest, and mount a local volume by mapping the local /homedirectory to the /home directory of the container.

docker run -d -it --cpus=8 --cpuset-cpus=3,4,8,9,12,16,20,21 --cpuset-mems=0 -m 16384m --name 8u16g_02 -v /home:/home centos:latest

NO TE

For details about the docker run command, see https://docs.docker.com/engine/reference/commandline/run/.Command format: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]The parameters are described as follows:● -d: runs the container in the background and prints the container ID.● -i: enables the STDIN even if no attach operation is performed.● -t: allocates a pseudo-TTY.● --cpus: specifies the number of CPUs to be allocated.● --cpuset-cpus: specifies the CPUs to be bound.● --cpuset-mems: specifies the NUMA node.● -m: specifies the memory size to be allocated.● --name: specifies the Docker container name..● -v: specifies the volume to be bound.● centos:latest: specifies the local image. centos indicates the repository, and latest

indicates the tag.

----End

Kunpeng BoostKit for VirtualizationTuning Guide 1 Docker Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 8

Page 13: Tuning Guide - HUAWEI CLOUD

2 KVM Tuning Guide

2.1 Introduction2.2 Tuning the Hardware2.3 Tuning the OS2.4 Tuning the Virtualization Settings2.5 Using the Automatic Tuning Tool HiKVMPerf

2.1 Introduction

KVM Virtualization ArchitectureThe Kernel Virtual Machine (KVM) virtualization software is deployed on physicalservers to virtualize hardware resources, so that one physical server can be used asmultiple virtual servers. Figure 2-1 shows the components of the KVM platform.QEMU-KVM virtualizes computing, storage, and network resources.

Figure 2-1 KVM virtualization architecture

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 9

Page 14: Tuning Guide - HUAWEI CLOUD

Tuning FlowThis document describes how to adjust the hardware parameters, OS, andvirtualization settings on a TaiShan server with CentOS 7.6 to achieve optimalKVM virtualization performance.

NO TE

Install KVM. For details, see KVM VM Installation Guide.

Figure 2-2 shows the process of performance tuning on the KVM platform.

Figure 2-2 Three tips for KVM performance tuning

Top N Tuning ItemsThe following table lists the most frequently used tuning items that affect orimprove the KVM performance. You can select proper tuning items based on yourrequirements to achieve optimal performance.

TuningItem

Description ApplicationScenario

Remarks

BIOS Set the memoryrefresh rate toAuto.

For commercialuse

Significantly improvesmemory bandwidthperformance. Fordetails, see Setting theMemory Refresh Rateto Auto.

NUMAaffinity

Ensure that eachvCPU and itsmemory are in thesame physical nodeto prevent cross-dieand cross-chipmemory access.

For commercialuse

Significantly improvesCPU virtualizationperformance. Fordetails, see 2.4.2Setting 1:1 CoreBinding and Same-DieMemory Access.

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 10

Page 15: Tuning Guide - HUAWEI CLOUD

TuningItem

Description ApplicationScenario

Remarks

CPU corebinding

Bind each vCPU toa core.

For commercialuse

Significantly improvesCPU virtualizationperformance. Fordetails, see 2.4.1Binding the VM toCores.

Bind vCPUs to coresin a CPU cluster

For POC testsonly

Significantly improvesmemory bandwidthperformance. Fordetails, see 2.4.1Binding the VM toCores.

Memoryhuge page

Disable thetransparent hugepage feature anduse 512 MB hugepages.

For commercialuse.

Significantly improvesCPU virtualizationperformance by about5% in CentOS 7.6. Fordetails, see 2.4.5 UsingHuge Pages for theVM.

2.2 Tuning the Hardware

2.2.1 Checking Hardware ValidityBefore installing and deploying the KVM environment, ensure that:

● The hardware (including servers and NICs) is in the TaiShan servercompatibility list.

● The OS is in the TaiShan server compatibility list and supports KVM. Thisdocument uses CentOS 7.6 and openEuler 20.03 as examples.

● The software (including the BIOS and BMC) is the latest version released onthe Huawei enterprise support website.

2.2.2 Modifying BIOS SettingsModify the advanced settings on the BIOS to improve the performance of thevirtualization platform. Table 2-1 lists the recommended performance-relatedBIOS configuration options on TaiShan servers.

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 11

Page 16: Tuning Guide - HUAWEI CLOUD

Table 2-1 BIOS performance configuration options

BIOS ConfigurationOption

RecommendedValue

Description

Custom Refresh Rate Auto Memory refresh rate. The defaultvalue is 32 ms.Choose Advanced > MemoryConfig > Custom Refresh Rate.

NUMA Enable Whether to enable the NUMAfunction (the default value isEnable).Choose Advanced > MemoryConfig > NUMA.

Stream Write Mode Allocate shareLLC

Stream write mode (the defaultvalue is Allocate share LLC).Choose Advanced > PerformanceConfig > Stream Write Mode.

CPU PrefetchingConfiguration

Enabled CPU prefetching configuration (thedefault value is Enabled).Choose Advanced > MISC Config >CPU Prefetching Configuration.

SRIOV Enable SR-IOV option (the default value isEnable).Choose Advanced > PCIe Config >SRIOV.

Support Smmu Enabled SMMU option (the default value isEnabled).Choose Advanced > MISC Config >Support Smmu.

Setting the Memory Refresh Rate to Auto

Step 1 Restart the server, enter the BIOS, and choose Advanced > Memory Config >Custom Refresh Rate.

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 12

Page 17: Tuning Guide - HUAWEI CLOUD

Step 2 Set Custom Refresh Rate to Auto and press F10 to save the BIOS settings.

----End

Enabling NUMAStep 1 Restart the server, enter the BIOS, and choose Advanced > Memory Config >

NUMA.

Step 2 Set NUMA to Enable and press F10 to save the BIOS settings.

----End

Setting Stream Write ModeStep 1 Restart the server, enter the BIOS, and choose Advanced > Performance Config >

Stream Write Mode.

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 13

Page 18: Tuning Guide - HUAWEI CLOUD

Step 2 Set Stream Write Mode to Allocate share LLC and press F10 to save the BIOSsettings.

----End

Enabling CPU PrefetchingAfter CPU prefetching is enabled, the CPU prefetches the next instruction from thememory into the cache to improve system efficiency.

NO TE

● When using lmbench to test the memory bandwidth, you are advised to enable CPUprefetching.

● When using lmbench to test memory latency, you are advised to disable CPUprefetching.

Step 1 Restart the server, enter the BIOS, and choose Advanced > MISC Config > CPUPrefetching Configuration.

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 14

Page 19: Tuning Guide - HUAWEI CLOUD

Step 2 Set CPU Prefetching Configuration to Enabled and press F10 to save the BIOSsettings.

----End

Enabling SR-IOV

Step 1 Restart the server, enter the BIOS, and choose Advanced > PCIe Config > SRIOV.

Step 2 Set SRIOV to Enable and press F10 to save the BIOS settings.

----End

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 15

Page 20: Tuning Guide - HUAWEI CLOUD

Enabling SMMU

The system memory management unit (SMMU) is an important componentprovided by Kunpeng 920 series processors for implementing virtualizationextensions.

Step 1 Restart the server, enter the BIOS, and choose Advanced > MISC Config >Support Smmu.

Step 2 Set Support Smmu to Enabled, and press F10 to save the BIOS settings.

----End

NO TICE

If the server has an Avago SAS3408iMR RAID controller card, set Support Smmuin the BIOS to Disabled.

2.2.3 Tuning the DIMM ConfigurationUse the Intelligent Computing Product Memory Configuration Assistant toconfigure DIMMs.

https://support-it.huawei.com/smca/?language=en

2.3 Tuning the OS

Upgrading gcc and glibc (CentOS)NO TE

This section applies only to CentOS 7.6.

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 16

Page 21: Tuning Guide - HUAWEI CLOUD

In CentOS 7.6, the default gcc version is 4.85 and glibc version is 2.17. Softwarecompilation requires certain gcc and glibc versions. Upgrading the gcc and glibcversions can improve the performance of some programs.

Table 2-2 describes the recommended gcc and glibc versions in the VM OS.

Table 2-2 gcc and glibc versions

Compiler Version How to Obtain

gcc 7.3.0 https://ftp.gnu.org/gnu/gcc/

glibc 2.27 https://ftp.gnu.org/gnu/libc/

Disabling Transparent Huge PagesThe transparent huge page (THP) function can reduce the complexity of usinglarge pages. Currently, THP has been tested and optimized in various systems,configurations, programs, and loads to improve the performance of most systemconfigurations. In scenarios where the STREAM tool is used to test memorybandwidth or memory access-intensive services, disabling transparent huge pagescan effectively improve performance.

● If the page table size in the VM OS is 64 KB, disable transparent huge pages.● If the page table size is 4 KB, you do not need to disable transparent huge

pages.

Step 1 Query the THP configuration.cat /sys/kernel/mm/transparent_hugepage/enabledcat /sys/kernel/mm/transparent_hugepage/defrag

NO TE

If [always] is displayed in the command output, THP is enabled. If [never] is displayed,THP is disabled. If [madvise] is displayed, THP is enabled only for the virtual memory area(VMA) specified by MADV_HUGEPAGE.

Step 2 Check the value of AnonHugePages in the meminfo file in the /proc directory.cat /proc/meminfo | grep -i huge

If the value is not 0, THP has taken effect. See the following figure.

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 17

Page 22: Tuning Guide - HUAWEI CLOUD

Step 3 Disable THP.echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled

----End

NO TE

To enable transparent huge pages, run the following command:echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled

2.4 Tuning the Virtualization Settings

2.4.1 Binding the VM to Cores

Binding the QEMU Process to Physical CPUsThe services of multiple VMs running on the same server are different, whichcauses different levels of resource occupation. For storage I/O-intensive VMs,storage I/O processes of different VMs need to be completely isolated to avoidinterference of adjacent VMs. The QEMU process is the main service process forprocessing front-end and back-end services. Therefore, this process needs to beisolated.

NO TE

When binding a VM to cores, it is recommended that physical CPUs 0 to 3 be reserved andnot used.

Step 1 Edit the VM XML configuration file.virsh edit vm1

NO TE

vm1 is the VM name.

Step 2 Add the following settings to the XML file.<domain type = 'KVM'>... <vcpu placement = 'static' cpuset='4-7'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='4'/> <vcpupin vcpu='1' cpuset='5'/> <vcpupin vcpu='2' cpuset='6'/> <vcpupin vcpu='3' cpuset='7'/> <emulatorpin cpuset='4-7'/> </cputune>...<domain>

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 18

Page 23: Tuning Guide - HUAWEI CLOUD

NO TE

● emulatorpin cpuset='4-7': binds the QEMU main thread to physical CPUs 4 to 7.

● vcpu placement = 'static' cpuset='4-7': enables the I/O thread and worker threads to useonly cores 4 to 7. If this parameter is not configured, the VM task threads run on anycore of the CPUs, resulting in more cross-NUMA and cross-DIE loss.

● vcpupin binds each CPU thread to a core. If vcpupin is not used to bind CPU threads,the threads may switch between cores 4 to 7, causing extra overheads.

----End

Cross-CPU Cluster Core Binding

Kunpeng 920 series processors provides two super CPU clusters (SCCLs). EachSCCL contains six to eight CPU clusters, and each CPU cluster contains four cores.When binding CPUs to a KVM, you are advised to use CPUs across multiple CPUclusters to improve the KVM performance. This method can reduce bandwidthbottlenecks between the L3 cache and memory caused by core competition in thesame CPU cluster.

Step 1 Query the NUMA node information and topology in the Linux system.numactl -H

Step 2 Edit the VM XML configuration file in the Linux system and bind vCPUs to cores inas many CPU clusters as possible. The following is an example:<domain type = 'KVM'>... <vcpu placement = 'static' cpuset='4,5,8,9,12,16,22,23'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='4'/> <vcpupin vcpu='1' cpuset='5'/> <vcpupin vcpu='2' cpuset='8'/> <vcpupin vcpu='3' cpuset='9'/> <vcpupin vcpu='4' cpuset='12'/> <vcpupin vcpu='5' cpuset='16'/> <vcpupin vcpu='6' cpuset='22'/> <vcpupin vcpu='7' cpuset='23'/> </cputune>...</domain>

----End

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 19

Page 24: Tuning Guide - HUAWEI CLOUD

2.4.2 Setting 1:1 Core Binding and Same-Die Memory AccessvCPUs of a VM can be bound to cores in the same processor or cores in the sameNUMA node. Avoid cross-die and cross-chip memory access to preventperformance deterioration. By default, vCPUs of different VMs may run on thesame physical core, which causes CPU resource competition and frequent VMIDchanges. As a result, L1 TLB flushing frequently occurs and the TLB miss rate ishigh, causing performance deterioration.

Step 1 Query the NUMA node information in the Linux system.numactl -H

Step 2 Edit the VM XML file in the Linux system, bind each vCPU to a core, and ensurethat the cores are in the same die (NUMA). The following is an example:<domain type = 'KVM'>... <vcpu placement = 'static' cpuset='4-7'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='4'/> <vcpupin vcpu='1' cpuset='5'/> <vcpupin vcpu='2' cpuset='6'/> <vcpupin vcpu='3' cpuset='7'/> <emulatorpin cpuset='4-7'/> </cputune>...</domain>

Step 3 Configure one NUMA channel (NUMA 0) for the VM.<domain type = 'KVM'>... <vcpu placement = 'static' cpuset='4-7'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='4'/> <vcpupin vcpu='1' cpuset='5'/> <vcpupin vcpu='2' cpuset='6'/> <vcpupin vcpu='3' cpuset='7'/> <emulatorpin cpuset='4-7'/> </cputune>... <numatune> <memory mode='strict' nodeset='0'/> </numatune>... <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='4' threads='1'/>

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 20

Page 25: Tuning Guide - HUAWEI CLOUD

<numa> <cell id='0' cpus='0-3' memory='8388608' unit='KiB'/> </numa> </cpu>...</domain>

NO TE

● The strict mode does not allow cross-NUMA memory allocation. The preferred modepreferentially allocates memory from the specified NUMA. If the memory is insufficient,the memory is allocated from another NUMA. It is recommended that the strict modebe used in POC and comparison tests, which can improve the performance by about 5%.

● <numa> provides the NUMA topology of the VM. cpus specifies the vCPU ID, andmemory specifies the memory size of the vnode.

● To improve VM performance, set <numatune> and <cputune> so that the vCPUs andtheir memory are in the same physical NUMA node.

● The value of cellid in <numatune> is the same as the value of cell id in <numa>.mode can be set to strict (allocate memory only from a node strictly and allocationfails if the memory is insufficient), preferred (allocate memory preferentially from anode and from another node if the memory is insufficient), interleave (allocatememory from specified nodes in a cross manner). nodeset specifies the physical NUMAnode.

----End

2.4.3 Configuring the VM NICWhen a VM uses the NIC passthrough function, you can adjust the number of NICqueues and binding interrupts to CPUs to achieve optimal network performanceand prevent the network from becoming a performance bottleneck.

Adjusting the Number of NIC Queues

Step 1 Run the following command on the VM to query the number of NIC queues:ethtool -l eth1

The preceding information shows that there are 63 queues.

Step 2 Run the top command to check the CPU usage for processing software interrupts.If the number of queues is insufficient, the CPU usage for processing softwareinterrupts is 100%. In this case, you need to add queues.

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 21

Page 26: Tuning Guide - HUAWEI CLOUD

Step 3 Dynamically adjust the number of queues based on the performance. Forexample, run the following command to change the number of queues to 48:ethtool -L eth1 combined 48

NO TE

More queues do not necessarily mean better performance. You need to observe the CPUusage for processing software interrupts to determine whether a performance bottleneckexists.

----End

Binding NIC Interrupts to CPUs

Step 1 Disable the irqbalance service on the VM.service irqbalance statusservice irqbalance stop

Step 2 Run the following command to query the NIC interrupt ID:cat /proc/interrupts | grep eth1

NO TE

eth1 is the network port name.

Step 3 Manually bind NIC interrupts to different CPUs.echo 2 > /proc/irq/xxx/smp_affinity_list

NO TE

xxx is the interrupt ID obtained in Step 2.

----End

NO TICE

You are advised to bind interrupts to vCPUs of the NUMA node where the physicalNIC resides. You can run the following command on the host to view the NUMAnode where the NIC resides:cat /sys/bus/pci/devices/0000\:03\:00.0/numa_node

2.4.4 Using virtio-blk for VM Storagevirtio-blk virtual drives provide higher storage performance than virtio-scsi virtualdrives. If VM services do not depend on virtio-scsi drives and have highperformance requirements, you are advised to use virtio-blk virtual drives.

Step 1 Add configuration in an XML file to use virtio-blk for the VM. The following is anexample:<domain type = 'KVM'>... <disk type='file' device='disk'/> <driver name='qemu' type='qcow2' /> <source file='home/kvm/images/node1.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk>

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 22

Page 27: Tuning Guide - HUAWEI CLOUD

...</domain>

----End

2.4.5 Using Huge Pages for the VMThe huge page function ensures that all memory of a VM always exists as hugepages on the host and ensures physical continuity. This function effectively reducesTranslation Lookaside Buffer (TLB) misses and significantly improves theperformance of memory-intensive services. If the VM uses huge pages, you candisable transparent huge pages (see Disabling Transparent Huge Pages) toreduce overheads on the host and improve VM performance stability.

NO TE

The size of each huge page varies with the OS type. UVP requires 1 GB and CentOS 7.6 inthis document requires 512 MB. Reserve 15% of the total memory for the host whenconfiguring huge pages in the virtualization scenario.

Step 1 On the host, check the huge page allocation on each NUMA node.cat /sys/devices/system/node/node*/meminfo | grep Huge

If the value of HugePages is 0, no huge page is configured.

Step 2 Configure the VM to use 512 MB huge pages.

NO TICE

Reserve 15% of the total memory for the host when configuring huge pages inthe KVM scenario.

For example, to create a 4U8G VM, allocate 300 x 512 MB huge pages on thehost.

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 23

Page 28: Tuning Guide - HUAWEI CLOUD

● CentOS

a. Edit the /boot/efi/EFI/centos/grub.cfg file on CentOS 7.6.vi /boot/efi/EFI/centos/grub.cfg

b. Add the following settings to the line starting with linux.default_hugepagesz=512M hugepagesz=512M hugepages=300

See the following figure:

● openEuler

a. Edit the /etc/grub2-efi.cfg file on openEuler-20.03-LTS-SP1.vi /etc/grub2-efi.cfg

b. Add the following settings to the line starting with linux.default_hugepagesz=512M hugepagesz=512M hugepages=256 pci=realloc

See the following figure:

Step 3 Restart the server.reboot

Step 4 Log in to the OS again and check the huge page configuration.cat /proc/sys/vm/nr_hugepages

Step 5 On the host, check the huge page allocation on each NUMA node.cat /sys/devices/system/node/node*/meminfo | grep Huge

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 24

Page 29: Tuning Guide - HUAWEI CLOUD

NO TE

The preceding figure shows that each NUMA node (numa0 to node3) has 75 x 512 MBhuge pages.

You can also run the following commands to view the huge page configuration ofa NUMA node.

The following uses node0 as an example.

cat /sys/devices/system/node/node0/hugepages/hugepages-524288kB/nr_hugepagescat /sys/devices/system/node/node0/hugepages/hugepages-524288kB/free_hugepages

NO TE

The value 75 in nr_hugepages indicates that the system has allocated 75 x 512 MB hugepages to the NUMA node. The value 75 in free_hugepages indicates that the system stillhas 75 x 512 MB huge pages.

Step 6 Check whether hugetlbfs has been mounted.mount | grep hugetlbfs

Information in the preceding figure shows that hugetlbfs has been mounted.

Step 7 Edit the XML file to configure huge pages for the VM. The following is anexample:<domain type = 'KVM'>...

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 25

Page 30: Tuning Guide - HUAWEI CLOUD

<memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>83886088</currentMemory> <memoryBacking> <hugepages/> </memoryBacking> <vcpu placement = 'static' cpuset='12-15'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='12'/> <vcpupin vcpu='1' cpuset='13'/> <vcpupin vcpu='2' cpuset='14'/> <vcpupin vcpu='3' cpuset='15'/> <emulatorpin cpuset='12-15'/> </cputune> <numatune> <memory mode='strict' nodeset='0'/> </numatune>...</domain>

----End

2.5 Using the Automatic Tuning Tool HiKVMPerf

2.5.1 Application Scenario

NO TICE

The HiKVMPerf tool applies only to TaiShan servers with CentOS 7.6 withoutvirtualization tuning. To prevent conflicts between the settings added by the tooland user-defined settings, ensure that the initial VM XML file is used. This tool isused for POC tests and benchmark tests. Read the preceding chapters in thisdocument before using the tool.Download address: https://mirrors.huaweicloud.com/kunpeng/archive/kunpeng_solution/cloud/KVM/HiKVMPerf.zip

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 26

Page 31: Tuning Guide - HUAWEI CLOUD

The automatic tuning tool HiKVMPerf supports the following tuning items:

1. Disabling transparent huge pages.2. Configuring 512 MB huge pages.3. NUMA Aware for VM CPUs and memory (range-based core binding)

You can select the tuning items according to Chapter 3 and Chapter 4.

2.5.2 Usage Guidelines

NO TICE

Shut down all VMs before tuning and restoration, and restart the host OS aftertuning and restoration are complete to make the settings take effect.

This tool is a shell script. It has three parameters tuning, verify, and restore,which are used for tuning, verification after tuning, and restoration after tuning.Contact the R&D team to obtain the tool.

Tuning

NO TICE

You need to run the following command in the directory of the HiKVMPerf.sh fileto perform tuning. Before running the command, ensure that the VM is shutdown.sh HiKVMPerf.sh tuning

The tuning process is as follows:

Step 1 Back up the original configuration, including the grub.cfg file and the VM XMLfile, to the backup directory in the current directory.

Step 2 (Optional) Disable THP.

Step 3 (Optional) Configure 512 MB huge pages.

Step 4 (Optional) Configure NUMA Aware.

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 27

Page 32: Tuning Guide - HUAWEI CLOUD

Set the number of reserved cores. The tool binds all VMs on the physical machineto cores and some cores need to be reserved for the physical machine. The valuerange is 0 to 24, and the recommended value is 4. The tool automaticallyredefines the VM after the XML file is modified.

Step 5 Restart the physical machine for the settings to take effect.reboot

----End

NO TE

The tool checks the CPU and memory overcommitment. Memory overcommitment is notallowed. The maximum memory usage cannot exceed 80%, and the maximum CPUovercommitment ratio is 1:3.

Verification After Tuning

Check whether the settings take effect after restart.

Step 1 Run the following command in the directory of the HiKVMPerf.sh file to verify thetuning result:sh HiKVMPerf.sh verify

Step 2 Run the following commands to check whether the settings have taken effect:

1. Check whether THP is disabled.cat /sys/kernel/mm/transparent_hugepage/enabled

2. Check whether the memory huge page settings have taken effect.cat /proc/sys/vm/nr_hugepagescat /sys/devices/system/node/node*/meminfo | grep Huge

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 28

Page 33: Tuning Guide - HUAWEI CLOUD

3. View the VM XML file (including the memory huge pages and NUMA Awareconfiguration).For example, run the following command to check the configuration of VM 1:virsh dumpxml vm1

The preceding figure shows that the THP has been disabled, the huge pagesfunction has been enabled, and the VM has been bind to cores.

----End

Restoration After Tuning

Step 1 Run the following command to check whether the VM is shut down. The VM mustbe shut down before the restoration.virsh list --all

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 29

Page 34: Tuning Guide - HUAWEI CLOUD

Step 2 Run the following command in the directory of the HiKVMPerf.sh file to performrestoration.sh HiKVMPerf.sh restore

Step 3 Reboot the system to make the operation take effect.reboot

----End

Kunpeng BoostKit for VirtualizationTuning Guide 2 KVM Tuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 30

Page 35: Tuning Guide - HUAWEI CLOUD

3 OpenStack Tuning Guide (CentOS 7.6)

3.1 Introduction

3.2 Tuning Hardware

3.3 OS Tuning

3.4 VM Tuning

3.1 Introduction

3.1.1 OpenStack OverviewOpenStack is a community, a project, and open-source software application. Itprovides open-source software for building public and private clouds. It provides acloud platform or tool set to deploy clouds and helps organizations run cloudsthat provide services for virtual computing or storage, providing scalable andflexible cloud computing for public clouds, private clouds, big clouds, and smallclouds.

As an open-source cloud computing management platform, OpenStack consists ofmultiple main components. OpenStack supports almost all types of cloudenvironments. It aims to provide a cloud computing management platformfeaturing easy implementation, large-scale expansion, rich functions, and unifiedstandards. OpenStack provides an infrastructure as a service (IaaS) solutionthrough various complementary services. Each service provides an API forintegration.

This optimization guide applies to the OpenStack Stein version.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 31

Page 36: Tuning Guide - HUAWEI CLOUD

3.1.2 EnvironmentThe OpenStack optimization verification cluster consists of one managementnode, six compute nodes, and three storage nodes. Table 3-1 shows theenvironment configuration.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 32

Page 37: Tuning Guide - HUAWEI CLOUD

Table 3-1 Environment configuration requirements

DeviceType

Model Configuration Quantity

Management nodesor networknodes

TaiShan server(model 2280)

CPUs: 2 x Huawei Kunpeng 920 5250processorsMemory: 12 x 32 GB DDR4 DIMMsSystem drives: 2 x 480 GB SATA SSDs(2 SSDs form a RAID 1 array)RAID controller card: LSI 3508Network devices: four GE electricalports, and 1 x 10GE NIC with fouroptical ports (1822)

1

Computenodes

TaiShan server(model 2280)

CPUs: 2 x Huawei Kunpeng 920 5250or 7260 processorsMemory: 16 x 32 GB DDR4 DIMMsSystem drives: 2 x 480 GB SATA SSDs(2 SSDs form a RAID 1 array)Data disk: 2 x 8 TB SATA HDDsRAID controller card: LSI 3508Network devices: four GE electricalports, and 1 x 10GE NIC with fouroptical ports (1822)

3 nodeswith5250CPUs,2 nodeswith7260CPUs,5 intotal

Computenodes

TaiShan server(model 2480)

CPUs: 4 x Huawei Kunpeng 920 7260processorsMemory: 32 x 32 GB DDR4 DIMMsSystem drives: 2 x 480 GB SATA SSDs(2 SSDs form a RAID 1 array)Data disk: 2 x 8 TB SATA HDDsRAID controller card: LSI 3508Network devices: four GE electricalports, and 1 x 10GE NIC with fouroptical ports (1822)

1 nodewith7260CPUs

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 33

Page 38: Tuning Guide - HUAWEI CLOUD

DeviceType

Model Configuration Quantity

Storagenodes

TaiShan server(model 2280)

CPUs: 2 x Huawei Kunpeng 920 5250processorsMemory: 6 x 32 GB DDR4 DIMMsSystem drives: 2 x 480 GB SATA SSDs(2 SSDs form a RAID 1 array)Acceleration disks: 2 x 3.2 TB NVMeSSDsData disks: 12 x 8 TB HDDsRAID controller card: LSI 3508Network devices: two GE electricalports, and 1 x 25GE/10GE NIC withtwo optical ports

3

3.1.3 Tuning GuidelinesObserve the following guidelines when tuning the performance: The guidelines areas follows:

● Analyze resource bottlenecks from multiple aspects to identify the root cause.The poor system performance of an aspect may be caused by the problem ofother aspects. For example, high CPU usage may be caused by insufficientmemory capacity and the CPU resources are exhausted by memoryscheduling.

● Adjust only one parameter of a specific aspect that affects the performance ata time. It is difficult to determine the parameter that causes the impact onperformance when multiple parameters are adjusted at the same time.

● During system performance analysis, the performance analysis tool occupiescertain system resources, such as CPU and memory resources. The running ofthe analysis tool may cause a more serious resource bottleneck in someaspects of the system.

3.1.4 Tuning Process FlowThe following describes the tuning process flow and analysis process of theOpenStack virtualization environment, as shown in Figure 3-1.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 34

Page 39: Tuning Guide - HUAWEI CLOUD

Figure 3-1 Tuning process flow

The tuning analysis flow is as follows:

1. If the performance metrics do not meet requirements, first locate theproblem. Check whether a command from the OpenStack client (controllernode) is successfully sent to the servers (compute or storage nodes).

2. If the server does not receive much stress, check whether the problem iscaused by the client (with a low probability) or network.

3. If the problem is caused by the servers, check the metrics of the hardware,such as the CPU, memory, drives, and network devices. Perform furtheranalysis based on the abnormal hardware metrics.

4. Analyze the applications running on the VM based on hardware metrics, forexample, analyze the algorithm design, number of threads, cache mode, andso on. Develop an optimization solution for each application specifically.

3.2 Tuning Hardware

3.2.1 Modifying BIOS Settings

PurposeConfigure advanced BIOS settings to improve server performance.

ProcedureTable 3-2 shows the recommended configuration.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 35

Page 40: Tuning Guide - HUAWEI CLOUD

Table 3-2 Recommended BIOS configuration

BIOSConfiguration Option

Recommended Value

DefaultValue

Description

CustomRefresh Rate

Auto 32ms Memory refresh rate.Choose Advanced > MemoryConfig > Custom Refresh Rate.

NUMA Enable Enable NUMA feature switch.Choose Advanced > MemoryConfig > NUMA.

DieInterleaving

Disable Disable Die interleaving.Choose Advanced > MemoryConfig > Die Interleaving.

One NUMAPer Socket

Disable Disable Enables one NUMA node oneach socket.Choose Advanced > MemoryConfig > One NUMA PerSocket

Stream WriteMode

Allocateshare LLC

Allocateshare LLC

Stream write mode.Choose Advanced >Performance Config > StreamWrite Mode.

CPUPrefetchingConfiguration

Enabled Enabled Enables CPU prefetching.Choose Advanced > MISCConfig > CPU PrefetchingConfiguration.

SRIOV Enable Enable SR-IOV option.Choose Advanced > PCIe Config> SRIOV.

SupportSmmu

Enabled Disabled SMMU function option.Choose Advanced > MISCConfig > Support Smmu.

Power Policy Performance Efficiency Power policy.Choose Advanced >Performance Config > PowerPolicy

Table 3-3 lists the recommended extra configuration for a 4-socket server.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 36

Page 41: Tuning Guide - HUAWEI CLOUD

Table 3-3 Recommended extra BIOS configuration

BIOSConfiguration Option

Recommended Value

DefaultValue

Description

Cache Mode in:private/out:private

in:partition/out:share

Cache mode.Choose Advanced >Performance Config > CacheMode.

NO TE

The preceding configuration applies only to virtualization scenarios.

Step 1 Restart the server and enter the BIOS.

Step 2 Perform the following steps to set the parameters based on the recommendedvalues in the preceding table:

1. Configure the memory refresh frequency.Enter the BIOS, choose Advanced > Memory Config > Custom Refresh Rate,and set the memory refresh frequency to Auto.

2. Set NUMA.Enter the BIOS, choose Advanced > Memory Config > NUMA, and set thevalue of NUMA to Enable, Die Interleaving to Disable, and One Numa PerSocket to Disable.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 37

Page 42: Tuning Guide - HUAWEI CLOUD

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 38

Page 43: Tuning Guide - HUAWEI CLOUD

3. Enable SRIOV.Enter the BIOS, choose Advanced > PCIe Config > SRIOV, and set the valueof SRIOV to Enable.

4. Enable SMMU.Enable the SMMU feature only in virtualization scenarios. In non-virtualization scenarios, disable the SMMU.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 39

Page 44: Tuning Guide - HUAWEI CLOUD

Enter the BIOS, choose Advanced > MISC Config > Support Smmu, and setthe value of Support Smmu to Enable.

5. Enable CPU prefetching.

Enter the BIOS, choose Advanced > MISC Config > CPU PrefetchingConfiguration, and set the value of CPU Prefetching Configuration toEnable.

6. Configure the power policy.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 40

Page 45: Tuning Guide - HUAWEI CLOUD

Enter the BIOS, choose Advanced > MISC Config > Power Policy, and set thevalue of Power Policy to Performance.

7. Set Stream Write Mode.Enter the BIOS, choose Advanced > Performance Config > Stream WriteMode, and set the value of Stream Write Mode to Allocate share LLC.

8. Set Cache Mode (only for the 4-socket environment).

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 41

Page 46: Tuning Guide - HUAWEI CLOUD

Enter the BIOS, select Advanced > Performance Config > Cache Mode, andset the value of Cache Mode to in:private out:private.

Step 3 Press F10 to save the BIOS settings and restart the server.

----End

3.2.2 Configuring the Memory

PurposeChange the quantity, frequency, and insertion method of the DIMMs to improvethe memory performance.

ProcedureUse the Intelligent Computing Product Memory Configuration Assistant toconfigure DIMMs.

https://support-it.huawei.com/smca/?language=en

It is recommended that each CPU is fully configured with eight DDR4 2933 MHz2-rank DIMMs, which occupy eight channels and maximize memory performance.

3.3 OS Tuning

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 42

Page 47: Tuning Guide - HUAWEI CLOUD

3.3.1 Checking the OS and Kernel

PurposeIn OpenStack virtualization scenarios, the CentOS OS is used by default. Currently,the CentOS 7.6 aarch64 version and the default kernel version are recommended.

Procedure

Step 1 Run the following command to check the OS version.cat /etc/redhat-release

The following is an example of the output result:

Step 2 Run the following command to check the kernel version.uname -r

The following is an example of the output result:

Step 3 Run the following command to check the number of NUMA nodes:numactl -H

The following is an example of the command output, which indicates that thereare four NUMA nodes:

For a Kunpeng ARM-based server, there are two NUMA nodes for each socket.That is, there are four NUMA nodes in a 2-socket environment and eight in a 4-socket environment. If the number of nodes is incorrect, perform the followingsteps. Otherwise, skip the following steps and go to Check the Hi1822 NIC driverto check the Hi1822 NIC driver.

Check whether Die Interleaving and One NUMA Per Socket are correctly set inthe BIOS.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 43

Page 48: Tuning Guide - HUAWEI CLOUD

BIOSConfiguration Option

Recommended Value

Description

DieInterleaving

Disable Die interleaving (the default value is Disable).Choose Advanced > Memory Config > DieInterleaving.

One NUMAPer Socket

Disable There is one NUMA node for each socket. Thedefault value is Disable.Choose Advanced > Memory Config > DieInterleaving.

Check the kernel option CONFIG_SHIFT_NODES. The value of the option must beat least 2 for the 2-socket environment and 3 for the 4-socket environment. If thenumber is incorrect, recompile the kernel.

cat /boot/config-$(uname -r) | grep CONFIG_NODES_SHIFT

Step 4 Check the Hi1822 NIC driver.

Run the top command when the CPU load is light to check whether the softwareinterrupt rate is abnormally high. If the following information is displayed, go tostep Step 5 to upgrade the Hi1822 NIC driver. If the following information is notdisplayed, skip step Step 5.

NO TE

Generally, run the ethtool -i xxx command to check the NIC information. If the driverversion in the command output is empty, you need to upgrade the driver. xxx indicates thename of the Hi1822 NIC in use.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 44

Page 49: Tuning Guide - HUAWEI CLOUD

Step 5 Upgrade the NIC.

Upgrade the driver.

Obtain the latest driver package from the following website and upgrade thedriver by referring to the guide:

https://support.huawei.com/enterprise/en/intelligent-accelerator-components/in500-solution-pid-23507369/software

For details about the upgrade, see Huawei IN200 NIC User Guide.

NO TE

If the operating system kernel needs to be modified, the compiled driver is not applicable.Apply for source code at http://support.huawei.com for recompilation. For details, seeversion "IN500 solution 5.1.0.SPC102" of the driver.

----End

3.3.2 Binding Service Processes to Cores

Purpose

On a compute node, some OpenStack service processes occupy a large number ofCPU resources during the running of the physical machine. Bind the processes toCPU cores to stabilize the VM performance.

Procedure

Step 1 Run the top command to check the processes with high physical CPU usage. Thefollowing is an example result:

Step 2 Run the taskset command to bind the corresponding processes to cores 0-3.ps -ef | grep -E "nova-compute|cinder-volume|ovs-vswitchd|libvirtd" | grep -v grep| awk '{print $2}' | xargs -i taskset -pc 0-3 {}

----End

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 45

Page 50: Tuning Guide - HUAWEI CLOUD

Note

Currently, the following processes are recommended for core binding (bound toCPUs 0–3): nova-compute, cinder-volume, ovs-vswitchd, and libvirtd. You can addor delete processes and specify which physical cores to bind to.

3.3.3 Configuring the NIC

Purpose

If an NIC receives large amounts of requests, the NIC triggers the interruptprogram to notify the kernel of new data packets. Then, the kernel invokes theinterrupt handler to copy the data packets from the NIC to the memory. When theNIC has only one queue, the copying of data packets at the same time can beprocessed by only one core. Therefore, the NIC multi-queue mechanism isintroduced. In this way, different cores can obtain data packets from different NICqueues at the same time.

When the NIC multi-queue function is enabled, the OS uses the irqbalance serviceto determine the CPU core that processes the network data packets in the NICqueue. If the CPU core that processes the interrupt and the NIC are not in thesame NUMA node, cross-NUMA memory access is triggered. Therefore, the CPUcore that processes the NIC interrupt can be set on the NUMA where the NIC islocated, thereby reducing extra overheads caused by cross-NUMA memory accessand improving network processing performance.

To optimize network performance, refer to the procedure below.

Procedure

Step 1 Stop the irqbalance service on the compute node.service irqbalance stop

Step 2 Query the number of NIC queues (assuming that the OpenStack service uses theNIC enp3s0):ethtool -l enp3s0

The preceding information shows that there are 16 queues.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 46

Page 51: Tuning Guide - HUAWEI CLOUD

Step 3 Check whether the CPU usage of the softirq process is too high. If yes, increase thenumber of queues. If this problem does not occur, go to the Query the NICinterrupt numbers step.

Step 4 Dynamically adjust the number of queues to 32.ethtool -L enp3s0 combined 32

NO TE

More queues do not necessarily mean better performance. You need to observe the CPUusage for processing software interrupts to determine whether a performance bottleneckexists.

Step 5 Check whether the number of software interrupts decreases.

Step 6 Query the NIC interrupt numbers.cat /proc/interrupts | grep enp3s0125: 23 0 ... enp3s0_qp0126: 0 0 ... enp3s0_qp1...

Step 7 Check the NUMA node where the NIC is located:cat /sys/class/net/enp3s0/device/numa_node

Step 8 Run the following command to bind each interrupt to a different CPU of theNUMA node based on the interrupt numbers. xxx indicates the interrupt numberobtained in the Query the NIC interrupt numbers step.echo 0 > /proc/irq/xxx/smp_affinity_list

NO TE

If multiple NICs are used in the environment, it is advised to perform core binding using thepreceding method for all NICs to improve the network processing performance.

----End

3.4 VM Tuning

3.4.1 Binding the VMs to Cores

Purpose

Multiple VMs are running on a server, and each VM provides different services. Asa result, resources are occupied to different extents. To prevent interference fromadjacent VMs, processes of different VMs need to be isolated. In addition,OpenStack processes are used to ensure network and storage services. Therefore,OpenStack processes need to be isolated from other processes.

Procedure

Method 1: Use OpenStack for Configuration

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 47

Page 52: Tuning Guide - HUAWEI CLOUD

Step 1 Set the value of vcpu_pin_set in the /etc/nova/nova.conf file to the CPU core listreserved for the customer system process. For example, you can perform thefollowing settings:

vcpu_pin_set = "0-3"

Step 2 Restart the nova service.systemctl restart openstack-nova-compute.service

Step 3 Set flavor to 1v1 core binding and FLAVOR-NAME to the name of the createdinstance type.openstack flavor set FLAVOR-NAME --property hw:cpu_policy=dedicated

NO TE

The flavor can be configured based on the actual service scenario. The possible values ofthe cpu_policy option and their descriptions are as follows:

● shared (default): Physical CPUs are not exclusively used and a vCPU can float amongdifferent physical CPUs.

● dedicated: Physical CPUs are exclusively used and the vCPU of a VM is strictly bound toa physical CPU.

Step 4 Create a VM by using this flavor.

NO TE

vCPUs cannot be explicitly specified in OpenStack core binding mode.

----End

Method 2: Use virsh for Configuration (Recommended)

vCPUs cannot be explicitly specified by using method 1. If you need to specifyvCPUs based on service requirements, use the virsh mode to explicitly bind vCPUsto cores. The procedure is as follows:

Step 1 Run the following command to query the VM instance name and the computenode to which the VM belongs. xxx indicates the VM name.nova show xxx

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 48

Page 53: Tuning Guide - HUAWEI CLOUD

Step 2 Run the following command to edit the XML configuration file of the VM oncompute1. instance-xxx indicates the instance name queried in step 1.virsh edit instance-000008f1

Step 3 Add the following settings to the XML file.<domain type='kvm'>...<vcpu placement = 'static' cpuset='4-7'>4</vcpu><cputune><vcpupin vcpu='0' cpuset='4'/><vcpupin vcpu='1' cpuset='5'/><vcpupin vcpu='2' cpuset='6'/><vcpupin vcpu='3' cpuset='7'/><emulatorpin cpuset='4-7'/></cputune>...</domain>

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 49

Page 54: Tuning Guide - HUAWEI CLOUD

Step 4 Restart the VM.virsh shutdown instance-000008f1virsh start instance-000008f1

----End

Recommended Policies● Scenario 1

When multiple VMs exist in the environment, the performance of 1v1 corebinding is better than that of range-based core binding.cpuset='4-7' controls the qemu-kvm thread and other working threads to useonly cores 4 to 7. If this parameter is not configured, the VM threads float onany cores, causing more cross-die and cross-chip loss.vcpupin binds each CPU thread to a core. If vcpupin is not used to bind CPUthreads, the threads may switch between cores 4 to 7, causing extraoverheads.

● Scenario 2When VMs have high requirements on memory bandwidth, the performanceof CPU core binding across CPU clusters is better than that of CPU corebinding in the same cluster.Kunpeng 920 series processors provides two super CPU clusters (SCCLs). EachSCCL contains six to eight CPU clusters, and each CPU cluster contains fourcores. During core binding, the four cores in the same CPU cluster competewith each other. As a result, the memory bandwidth bottleneck occurs in theL3 cache.Therefore, you are advised to bind vCPUs to cores in multiple CPU clusters inthe virtualization environment to avoid the memory bandwidth bottleneckcaused by multiple cores in the same CPU cluster competing the L3 cache.vCPUs of VMs are bound to cores across CPU clusters. When the load is light,the competition on the L3 cache is reduced, and the memory bandwidth canbe used to the maximum extent.Competition for L3 cache tags can be dramatically reduced and the memorybandwidth and CPU computing performance can be improved by performingCPU binding on as many CPU clusters as possible.The following is an example of cross-cluster core binding:<domain type='kvm'>...<vcpu placement = 'static' cpuset='4,8,12,16'>4</vcpu><cputune><vcpupin vcpu='0' cpuset='4'/><vcpupin vcpu='1' cpuset='8'/><vcpupin vcpu='2' cpuset='12'/><vcpupin vcpu='3' cpuset='16'/><emulatorpin cpuset='4,8,12,16'/></cputune>...</domain>

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 50

Page 55: Tuning Guide - HUAWEI CLOUD

3.4.2 Configuring the NUMA Affinity

PurposeVMs can be bound to cores by socket or NUMA. However, the performance forCPU cores in different NUMA nodes to access the same memory varies. Thememory access latency in descending order is as follows: cross-CPU, cross-NUMA,and intra-NUMA. Therefore, avoid cross-NUMA and cross-chip memory accesswhen applications are running. A good NUMA core binding policy can betterreduce the access latency.

ProcedureMethod 1: Use OpenStack for Configuration

Step 1 Set flavor to NUMA affinity, use one NUMA node, and set FLAVOR-NAME to thename of the created instance type.openstack flavor set FLAVOR-NAME --property hw:numa_nodes=1

NO TE

● If the value of numa_nodes is set to 1, the scheduler selects a compute node on whichsingle NUMA node meets the VM flavor configuration requirements.

● If the value of numa_nodes is greater than 1, the scheduler selects a compute node nowhich the number of NUMA nodes and the resource in the NUMA nodes meet the VMflavor configuration requirements.

Step 2 Create a VM by using the flavor.

----End

Method 2: Use virsh for Configuration (Recommended)

Step 1 View the NUMA node information on the physical machine.numactl -H

NO TE

If the numactl tool is not installed, run the following command to install it:yum install numactl

Step 2 The following command output indicates that CPUs 0 to 23 are on NUMA node 0and CPUs 24 to 47 are on NUMA node 1.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 51

Page 56: Tuning Guide - HUAWEI CLOUD

Step 3 Edit the XML configuration file of the VM on the compute node and perform corebinding on the same die. For details about how to query instances, see 3.4.1Binding the VMs to Cores. The following is an example of the configuration file:<domain type = 'KVM'>...<vcpu placement = 'static' cpuset='4-7'>4</vcpu><cputune><vcpupin vcpu='0' cpuset='4'/><vcpupin vcpu='1' cpuset='5'/><vcpupin vcpu='2' cpuset='6'/><vcpupin vcpu='3' cpuset='7'/><emulatorpin cpuset='4-7'/></cputune>...<numatune><memmode cellid='0' mode='strict' nodeset='0'/></numatune>...<cpu mode='host-passthrough' check='none'><topology sockets='4' cores='1' threads='1'/><numa><cell id='0' cpus='0-3' memory='83886088' unit='KiB'/></numa></cpu>...</domain>

----End

Recommended PolicyThe non-cross-die + strict mode is recommended for memory allocation.

To improve VM performance, set <numatune> and <cputune> so that the vCPUsand their memory are in the same physical NUMA node.

For the memory allocation mode, the strict mode is preferred. The strict modedoes not allow cross-NUMA memory allocation. The preferred mode preferentiallyallocates memory from the specified NUMA. If the memory is insufficient, thememory is allocated from another NUMA.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 52

Page 57: Tuning Guide - HUAWEI CLOUD

3.4.3 Configuring the Memory Huge Page Function

Purpose

The memory huge page function ensures that all memory of a VM always existsas huge pages on the host and ensures physical continuity. This functioneffectively reduces Translation Lookaside Buffer (TLB) misses and significantlyimproves the performance of memory-intensive services. If the VM uses hugememory pages, you can disable transparent huge pages to reduce overheads onthe host and improve VM performance stability.

Procedure

Step 1 On the host compute node, check the huge page allocation on each NUMA node.cat /sys/devices/system/node/node*/meminfo | grep HugePages_Total

If the value of HugePages is 0, the memory huge page function is not enabled.

Step 2 If the memory huge page function is not enabled, enable it first. (If the memoryhuge page function is enabled, skip this step and directly perform the Configurethe VM to use huge memory pages step.)

Add the following content to the /boot/efi/EFI/centos/grub.cfg file:

default_hugepagesz=512M hugepagesz=512M hugepages=300

NO TE

The parameters are described as follows:

● hugepages: defines the number of permanent large pages allocated during startup inthe kernel. The default value is 0, indicating no permanent large page is allocated. Theallocation is successful only when the system has enough consecutive available pages.Pages reserved by this parameter cannot be used for other purposes.

● hugepagesz: defines the size of a huge page allocated during startup in the kernel.

● default_hugepagesz: defines the default size of a large page allocated during startup inthe kernel.

The following is an example:

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 53

Page 58: Tuning Guide - HUAWEI CLOUD

NO TE

The huge page unit varies depending on the OS. In this document, the huge page unit forCentOS 7.6 is set to 512 MB. Reserve 15% of the total memory for the host whenconfiguring huge pages in the virtualization scenario. You can calculate the size of a hugepage based on your service requirements and memory configuration in the environment.

Step 3 Restart the server and check the memory huge page information.cat /proc/sys/vm/nr_hugepages

Step 4 Configure the VM to use huge memory pages● Method 1: Use OpenStack for Configuration

openstack flavor set FLAVOR-NAME --property hw:mem_page_size=large

Create a VM by using this flavor.

NO TE

The flavor can be configured based on the actual service scenario. The possible valuesof the mem_page_size option and their descriptions are as follows:– small (default): VMs use small pages by default.– large: The huge memory pages are used only for VMs, and the page size is the

same as that of the physical machine huge pages.– any: The program determines the memory pages to be used.– pagesize: Specifies a special page size in KB. You can also specify the unit, for

example, 4 KB, 2 MB, 2048 or 1 GB.

● Method 2: Use virsh for ConfigurationEdit the XML file on the compute node. For details about how to queryinstances, see 3.4.1 Binding the VMs to Cores. The following is an exampleof the configuration file:<domain type = 'KVM'>...<memoryBacking><hugepages> <page size='524288' unit='KiB' nodeset='0'/> </hugepages> </memoryBacking>...</domain>

Step 5 Disable transparent huge pages on physical machines to ensure stableperformance.echo never > /sys/kernel/mm/transparent_hugepage/enabled

NO TE

To enable the transparent huge pages, run the following command:echo always > /sys/kernel/mm/transparent_hugepage/enabled

----End

Recommended PolicyThis function applies to scenarios where a small number of VMs are deployed. Ifthere are a large number of VMs, you can directly use small memory pages.

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 54

Page 59: Tuning Guide - HUAWEI CLOUD

NoteHuge memory pages can be allocated to VMs, but VMs may not use them. If hugememory pages are not enabled for a VM, the VM will identify the pages as smallpages.

3.4.4 Upgrading GCC

PurposeIn CentOS 7.6, the default GCC version is 4.8.5 and glibc version is 2.17. Softwarecompilation requires certain gcc and glibc versions. Upgrading the gcc and glibcversions in the VMs can improve the performance of some programs.

ProcedureThe tables below describe the recommended versions of gcc and glibc and theirdependencies in the VM OSs.

Compiler

Version How to Obtain

gcc 7.3.0 or later https://ftp.gnu.org/gnu/gcc/

glibc 2.27 https://ftp.gnu.org/gnu/libc/

Dependency

Version How to Obtain

gmp 6.1.2 https://ftp.gnu.org/gnu/gmp/

mpfr 3.1.5 https://ftp.gnu.org/gnu/mpfr/

mpc 1.0.3 https://ftp.gnu.org/gnu/mpc/

isl 0.18 https://sourceforge.net/projects/libisl/files/

Step 1 Download the preceding software packages to the /home directory. The followingis an example:cd /homewget https://ftp.gnu.org/gnu/gcc/gcc-7.3.0/gcc-7.3.0.tar.gzwget https://ftp.gnu.org/gnu/libc/glibc-2.27.tar.gzwget https://ftp.gnu.org/gnu/gmp/gmp-6.1.2.tar.bz2wget https://ftp.gnu.org/gnu/mpc/mpc-1.0.3.tar.gzwget http://isl.gforge.inria.fr/isl-0.18.tar.bz2

Step 2 Install gmp.cd /hometar -xvf /home/gmp-6.1.2.tar.bz2cd /home/gmp-6.1.2./configure --prefix=/usr/local/gmp-6.1.2make -jmake install

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 55

Page 60: Tuning Guide - HUAWEI CLOUD

Step 3 Install mpfr.cd /hometar -zxvf mpfr-3.1.5.tar.gzcd /home/mpfr-3.1.5./configure --prefix=/usr/local/mpfr-3.1.5 --with-gmp=/usr/local/gmp-6.1.2make -jmake install

Step 4 Install mpc.cd /hometar -zxvf mpc-1.0.3.tar.gzcd /home/mpc-1.0.3./configure -prefix=/usr/local/mpc-1.0.3 -with-gmp=/usr/local/gmp-6.1.2 -with-mpfr=/usr/local/mpfr-3.1.5make -jmake install

Step 5 Install isl.cd /hometar -xvf /home/isl-0.18.tar.bz2cd /home/isl-0.18yum -y install gmp-devel./configure --prefix=/usr/local/isl-0.18 --with-gmp=/usr/local/gmp-6.1.2makemake install

Step 6 Install gcc. (Version 7.3.0 is used as an example.)cd /hometar -zxvf gcc-7.3.0.tar.gzcd /home/gcc-7.3.0./configure --prefix=/usr/local/gcc-7.3.0 --enable-languages=c,c++,fortran --enable-shared --enable-linker-build-id --without-included-gettext --enable-threads=posix --disable-multilib --disable-nls --disable-libsanitizer --disable-browser-plugin --enable-checking=release --build=aarch64-linux --with-gmp=/usr/local/gmp-6.1.2 --with-mpfr=/usr/local/mpfr-3.1.5 --with-mpc=/usr/local/mpc-1.0.3 --with-isl=/usr/local/isl-0.18export LD_LIBRARY_PATH=/usr/local/mpc-1.0.3/lib:/usr/local/gmp-6.1.2/lib:/usr/local/mpfr-3.1.5/lib:/usr/local/gcc-7.3.0/lib64:/usr/local/isl-0.18/lib:/usr/local/lib:/usr/lib:$LD_LIBRARY_PATHexport PATH=/usr/local/gcc-7.3.0/bin:$PATHmake -jmake -j install

Step 7 Set the environment variables.

Add the following environment variables to the /etc/profile file:

export LD_LIBRARY_PATH=/usr/local/mpc-1.0.3/lib:/usr/local/gmp-6.1.2/lib:/usr/local/mpfr-3.1.5/lib:/usr/local/gcc-7.3.0/lib64:/usr/local/isl-0.18/lib:/usr/local/lib:/usr/lib:$LD_LIBRARY_PATHexport PATH=/usr/local/gcc-7.3.0/bin:$PATH

Step 8 Make the environment variables take effect.source /etc/profile

Step 9 Check the GCC version.gcc -v

Step 10 Install glibc.cd /hometar -zxvf glibc-2.27.tar.gzcd glibc-2.27

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 56

Page 61: Tuning Guide - HUAWEI CLOUD

/home/glibc-2.27/configure --prefix=/usr/local/glibc-2.27make -jmake install

Step 11 Set the environment variable.

Add the following environment variable to the /etc/profile file:

export PATH=/usr/local/glibc-2.27/bin:$PATH

Step 12 Make the environment variable take effect.source /etc/profile

Step 13 Check the glibc version.ldd --version

NO TE

If the gcc version is upgraded to 7.3.0, you can add the compilation option -march=armv8-a to CFLAGS and CPPFLAGS to use the armv8 instruction set compatible with the Kunpengprocessors.If the gcc version is upgraded to 9.1.0 or later, you can add the compilation option -mtune=tsv110 to specify that the tsv110 pipeline is used.

----End

Kunpeng BoostKit for VirtualizationTuning Guide 3 OpenStack Tuning Guide (CentOS 7.6)

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 57

Page 62: Tuning Guide - HUAWEI CLOUD

4 Kunpeng BoostKit for VirtualizationPerformance Tuning Guide

4.1 Introduction

4.2 CentOS 7.6 Tuning

4.3 Low-Load V-Turbo Tuning

4.4 Tuning in Resource Fragmentation Scenarios

4.5 Change History

4.1 Introduction

Tuning Overview

This user guide applies to KVM-based virtualization scenarios.

This document is applicable to CentOS 7.6 and openEuler. In this document, 4.2CentOS 7.6 Tuning only applies to CentOS 7.6.

Environment Configuration

Table 4-1 lists the recommended hardware.

Table 4-1 Hardware requirements

Category Item Requirement

Hardware Server TaiShan 200 server (model 2280)

CPU Kunpeng 920 processor

NIC TM210 with electrical ports

DIMM 16 for the 2666 or 2933 model

Kunpeng BoostKit for VirtualizationTuning Guide

4 Kunpeng BoostKit for Virtualization PerformanceTuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 58

Page 63: Tuning Guide - HUAWEI CLOUD

Table 4-2 lists the software versions recommended for openEuler 20.9.

Table 4-2 Software versions recommended for openEuler 20.9

Name Remarks

OS ● Host:openEuler 20.09 for ARM (default kernel: 4.19)

● Guest:openEuler 20.09 for ARM (default kernel: 4.19)

Compiler GCC 7.3.0glibc 2.27

libvirt 6.2.0

QEMU-KVM 4.1.0

Table 4-3 lists the software versions recommended for CentOS 7.6.

Table 4-3 Software versions recommended for CentOS 7.6

Name Remarks

OS ● Host:CentOS 7.6 for ARM (default kernel: 4.14)

● Guest:CentOS 7.6 for ARM (default kernel: 4.14)

Compiler GCC 7.3.0glibc 2.27

libvirt 5.6.0

QEMU-KVM 4.0.1

4.2 CentOS 7.6 Tuning

4.2.1 Installing QEMUQEMU needs to be installed only on the host.

Step 1 Download the code library from the Kunpeng community.git clone https://github.com/kunpengcompute/qemu.git

Step 2 Switch the branch.cd qemugit checkout stable-4.0

Step 3 Create a build directory and go to this directory.

Kunpeng BoostKit for VirtualizationTuning Guide

4 Kunpeng BoostKit for Virtualization PerformanceTuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 59

Page 64: Tuning Guide - HUAWEI CLOUD

mkdir build && cd build

Step 4 Perform compilation and installation.../configure –target-list=aarch64-softmmumake install -j 16

Step 5 Check the QEMU version.virsh version

The QEMU version is 4.0.1, as shown in the following figure.

----End

4.2.2 Dual-layer Scheduling

Kernel DescriptionWhen the compiled CentOS 7.6 kernel is installed on the host machine and VM,the following features are supported to improve the overall system performance:

● vcpu preempt: optimizes vCPU scheduling to prevent vCPUs from failing to bescheduled in the case of CPU overcommitment.

● pvspinlock optimization: fixes the performance deterioration problem causedby long lock time of the vCPUs.

● guest vcpu topology: fixes the problem of incorrectly displaying the CPUtopology on a VM with the native CentOS 7.6 kernel.

CA UTION

The pvspinlock optimization feature is disabled by default. You need to add thearm_pvspin option to the cmdline of the VM. After the feature is enabled, thefollowing information will be displayed in the message log of the VM:

Kernel InstallationThe kernel must be installed on both the host and VM.

Step 1 Install and compile the dependency. The following uses the installation on avirtual host as an example. The VM installation type is Virtualization Host.

Kunpeng BoostKit for VirtualizationTuning Guide

4 Kunpeng BoostKit for Virtualization PerformanceTuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 60

Page 65: Tuning Guide - HUAWEI CLOUD

Step 2 Download the kernel library from the Kunpeng community.git clone https://github.com/kunpengcompute/kernel-alt.git

Step 3 Download the kernel source code package and install it.wget https://archive.kernel.org/centos-vault/altarch/7.6.1810/os/Source/SPackages/kernel-alt-4.14.0-115.el7a.0.1.src.rpmrpm -ivh kernel-alt-4.14.0-115.el7a.0.1.src.rpm

Step 4 Copy the files to the specified directories.cd kernel-altcp -f *patch ~/rpmbuild/SOURCES/cp -f kernel-alt-4.14.0-aarch64.config ~/rpmbuild/SOURCES/cp -f kernel-alt-4.14.0-aarch64-debug.config ~/rpmbuild/SOURCES/cp -f kernel-alt.spec ~/rpmbuild/SPECS/

Step 5 Install dependencies.yum -y install m4.aarch64 gcc.aarch64 xmlto.aarch64 asciidoc.noarch openssl-devel.aarch64 hmaccalc.aarch64 python-devel.aarch64 newt-devel.aarch64 perl-ExtUtils-Embed.noarch git.aarch64 elfutils-devel.aarch64 zlib-devel.aarch64 binutils-devel.aarch64 bison.aarch64 audit-libs-devel.aarch64 java-devel numactl-devel.aarch64 pciutils-devel.aarch64 ncurses-devel.aarch64 rpm-build

Step 6 Start the compilation.cd ~/rpmbuild/SPECS/rpmbuild -bb kernel-alt.spec

Step 7 Install and compile the kernel.cd ~/rpmbuild/RPMS/aarch64rpm -ivh kernel-4.14.0-115.el7.0.2.aarch64.rpm

Step 8 Set the GRUB boot option to the newly installed kernel and restart the system.

Step 9 Check the kernel.uname -r

After the kernel is successfully installed, check the OS kernel version, as shown inthe following figure.

----End

4.3 Low-Load V-Turbo TuningXML Settings

Step 1 Set the number of vCPUs to the actual number of VMs.

Kunpeng BoostKit for VirtualizationTuning Guide

4 Kunpeng BoostKit for Virtualization PerformanceTuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 61

Page 66: Tuning Guide - HUAWEI CLOUD

Step 2 Set the value of threads to 2. (Number of vCPUs = Numbers of sockets x Numberof cores x Number of threads)

Step 3 After the configuration, log in to the VM and check whether the configuration issuccessful.lscpu

The configuration is successful if the following information is displayed:

----End

4.4 Tuning in Resource Fragmentation Scenarios

4.4.1 Enabling Memory Interleaving

Configuring the BIOS

Step 1 Enter the BIOS and choose Advanced > Memory Config.

Kunpeng BoostKit for VirtualizationTuning Guide

4 Kunpeng BoostKit for Virtualization PerformanceTuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 62

Page 67: Tuning Guide - HUAWEI CLOUD

Step 2 Set Die Interleaving to Enable.

Step 3 Log in to the host and check the number of NUMA nodes.numactl -H

After the configuration is successful, the number of NUMA nodes is reduced byhalf. As shown in the following figure, after memory interleaving is enabled, thenumber of NUMA nodes changes from 4 to 2.

Kunpeng BoostKit for VirtualizationTuning Guide

4 Kunpeng BoostKit for Virtualization PerformanceTuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 63

Page 68: Tuning Guide - HUAWEI CLOUD

----End

NO TE

The prerequisite for enabling memory interleaving is that there is a DIMM in each memorychannel. Each CPU of the Kunpeng 920 processor has eight memory channels. If each CPUis configured with eight DIMMs, Advanced > Memory Config > Channel interleaving3way is set to Enable by default. Otherwise, it is set to Disable. Memory interleaving ismainly intended for x86 memory configuration.

Constraints

This feature is used to reduce the memory fragmentation rate. After memoryinterleaving is enabled, the performance of memory-sensitive services deterioratesby about 10%.

4.4.2 Configuring Guest NUMA

XML Settings

Step 1 When configuring guest NUMA, you can specify the location of vNode memory onthe host to implement memory block binding and vCPU binding so that the vCPUand memory on the vNode are on the same physical NUMA node. TVMconfigurations are as follows:

Figure 4-1 VM CPU binding configuration

Kunpeng BoostKit for VirtualizationTuning Guide

4 Kunpeng BoostKit for Virtualization PerformanceTuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 64

Page 69: Tuning Guide - HUAWEI CLOUD

Figure 4-2 VM NUMA configuration

Figure 4-3 VM NUMA configuration details

Step 2 After the VM is configured, log in to the VM and check the number of NUMAnodes.numactl –H

As shown in the following figure, the number of NUMA nodes is 4, and thenumber of CPUs on each NUMA node is as expected.

----End

Constraints

This feature is used to detect the NUMA architecture on the host in the VM. Theimpact on performance depends on application characteristics.

Kunpeng BoostKit for VirtualizationTuning Guide

4 Kunpeng BoostKit for Virtualization PerformanceTuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 65

Page 70: Tuning Guide - HUAWEI CLOUD

4.5 Change HistoryDate Description

2020-11-10 The issue is the first official release.

Kunpeng BoostKit for VirtualizationTuning Guide

4 Kunpeng BoostKit for Virtualization PerformanceTuning Guide

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 66

Page 71: Tuning Guide - HUAWEI CLOUD

A Change History

Date Description

2021-10-15 This issue is the eleventh official release.Added the openEuler content to 2 KVM Tuning Guide.

2021-09-15 This issue is the tenth official release.Added the openEuler content to 1 Docker Tuning Guide.

2021-08-11 This issue is the ninth official release.Optimized the code in 2.4.2 Setting 1:1 Core Binding andSame-Die Memory Access of the KVM Tuning Guide(CentOS 7.6).

2021-03-23 This issue is the eighth official release.Changed the solution name from "Kunpeng virtualizationsolution" to "Kunpeng BoostKit for Virtualization."

2020-12-24 This issue is the seventh official release.Modified 2.4.1 Binding the VM to Cores in the KVM TuningGuide (CentOS 7.6).

2020-12-15 This issue is the sixth official release.Modified the recommended value of the memory refreshfrequency in the BIOS settings from 64 ms to Auto.

2020-11-10 This issue is the fifth official release.Added 4 Kunpeng BoostKit for VirtualizationPerformance Tuning Guide.

2020-09-21 This issue is the fourth official release.Changed the solution name from "Kunpeng cloud platformsolution" to "Kunpeng virtualization solution".

2020-06-24 This issue is the third official release.Added 3 OpenStack Tuning Guide (CentOS 7.6).

Kunpeng BoostKit for VirtualizationTuning Guide A Change History

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 67

Page 72: Tuning Guide - HUAWEI CLOUD

Date Description

2020-05-15 This issue is the second official release.Modified the tuning flowcharts in 1.1 Introduction of theDocker Tuning Guide (CentOS 7.6) and in 2.1 Introductionof the KVM Tuning Guide (CentOS 7.6).

2020-03-20 This issue is the first official release.

Kunpeng BoostKit for VirtualizationTuning Guide A Change History

Issue 11 (2021-10-15) Copyright © Huawei Technologies Co., Ltd. 68