Xen Summit AMD 2010 VM Memory Allocation Schemes and PV NUMA Guests Dulloor Rao
Xen Summit AMD 2010
VM Memory Allocation Schemes and PV NUMA Guests
Dulloor Rao
Xen Summit AMD 2010
Agenda
● Motivation● VM memory allocation strategies –
CONFINED, SPLIT, STRIPED● AUTOMATIC (default) allocation scheme● PV NUMA Guests● Summary
Xen Summit AMD 2010
Motivation – NUMA Overheads
Xen Summit AMD 2010
Motivation – NUMA Overheads
● CPU0 and CPU1 are Hyper-Threads.
● CPU0 and CPU2 are on the same node.
● CPU0 and CPU8 are on different nodes.
● Overheads are due to both Cache Hierarchy (L1/L2/LLC) and Memory Organization (NUMA)
● Modified Cache Coherency State – Cacheline is present only in the current cache and is dirty. The cacheline is written back to main memory before any reads.
● Substantial overhead in accessing remote node's memory.
Xen Summit AMD 2010
Motivation – NUMA-related OS Optimizations (Linux as example)
● OS employs many optimizations to reduce inter-node memory accesses – memory management, scheduler, OS data-structures, etc.
● OS defines multiple NUMA allocation policies (MPOL_{DEFAULT/BIND/PREFERRED/INTERLEAVE}) to suit different applications. DEFAULT is local allocation.
● Significant performance improvement from system-level NUMA optimizations.
Xen Summit AMD 2010
Motivation – NUMA-related Application Optimizations (Linux)
● DEFAULT memory policy (of allocating from local node) and a NUMA-aware scheduler reduce the inter-node accesses.
● Libraries (numactl on Linux) are provided to select appropriate memory placement policy for specific application requirements.
● CONCLUSION – NUMA-related optimizations at OS-level and Application-level are too important and too many to ignore or discard.
Xen Summit AMD 2010
Motivation – Virtualization on NUMA platforms (Issues)
● Ad-hoc and Minimum-Effort VM memory allocation schemes.
● For instance, XEN tries to allocate all the memory for a VM from a single memory node and pin the VM to the node, for a one-to-one mapping between a VM and a node.
● Not always possible to allocate from a single node – VM size, node memory fragmentation, etc.
● Dynamic memory Interfaces (such as memory ballooning) could still disrupt the mapping, by allocating from some other node.
Xen Summit AMD 2010
Motivation – Virtualization on NUMA platforms (Issues)
Xen Summit AMD 2010
VM Memory Allocation Strategies
● CONFINED : Allocate the entire VM memory from a single node. Goal : Maximize performance.
● SPLIT : Allocate the VM memory from a set of nodes by splitting equally across the nodes. Goal : Maximize performance (with Enlightenment).
● STRIPED : Interleave the VM memory across a set of nodes. Goal : Predictable (average) performance.
Xen Summit AMD 2010
VM Memory Allocation Strategies - CONFINED
Xen Summit AMD 2010
VM Memory Allocation Strategies - SPLIT
Xen Summit AMD 2010
VM Memory Allocation Strategies - STRIPED
Xen Summit AMD 2010
Automatic VM Memory Allocation Scheme
● TRY : Allocate CONFINED using Best-Fit-Decreasing (BFD).
● TRY : Allocate SPLIT using Best-Fit-Decreasing (BFD), if the guest is NUMA-enabled. Enlighten the guest.
● Allocate STRIPED using First-Fit-Increasing (FFI).● BFD returns the minimal-subset of nodes.● FFI returns the maximal-subset of nodes. Used with
STRIPED to reduce the fragmentation of free node memory.
Xen Summit AMD 2010
VM Memory Allocation Strategy - SPLIT
● Used to construct a strict one-to-one mapping between virtual nodes and physical nodes.
● HVM : Export the VM memory layout using ACPI tables. VM constructs virtual nodes.
● PV : Export the VM memory layout using Virtual NUMA Enlightenment. VM constructs and maintains virtual nodes.
Xen Summit AMD 2010
PV NUMA Guest - Enlightenment
Xen Summit AMD 2010
PV NUMA Guest - Construction of Virtual Nodes
● Guest reads the Virtual NUMA Enlightenment using a hypercall.
● Guest constructs the (virtual) nodes and (virtual) cpu-to-node mappings.
● Guest (virtual) node distances reflect the actual distances between the underlying physical nodes.
Xen Summit AMD 2010
PV NUMA Guest – Construction of Virtual Nodes
Xen Summit AMD 2010
PV NUMA Guest – Maintenance of Virtual Nodes
● Dynamic memory interfaces could increase/decrease/exchange the VM memory reservations. Eg. Ballooning (Table in slide 7)
● Modify the interfaces to use Virtual NUMA Enlightenment. Maintain the strict mapping between Virtual and Physical nodes.
Xen Summit AMD 2010
PV NUMA Guest -Maintenance of Virtual Nodes
Xen Summit AMD 2010
PV NUMA Guest – Maintenance of Virtual Nodes
● Strict approach could lead to starvation in CONFINED/SPLIT VMs.
● Under memory pressure, relax the strict one-to-one mapping between virtual and physical nodes.
● Provide a mechanism to the guests to look-up physical node-id corresponding to a guest physical address.
● Periodically sweep through the VM memory and converge to original state (indefinitely).
Xen Summit AMD 2010
Results – linpack benchmark
Xen Summit AMD 2010
Summary
● VM Memory Allocation Strategies for NUMA – CONFINED/SPLIT/STRIPED.
● Automatic VM Memory Allocation Scheme.● NUMA Guests with SPLIT strategy :
● HVM – Inform using SLIT/SRAT ACPI tables● PV – Inform using Enlightenment
● PV NUMA Guests● Construction of Virtual Nodes● Maintenance of Virtual Nodes (Eg, Ballooning)
Xen Summit AMD 2010
Questions ?
Xen Summit AMD 2010
Thank You !