Chapter 8 Chapter 8 System Virtual Machines System Virtual Machines 2005.11.9 2005.11.9 Dong In Shin Dong In Shin Distributed Computing System Laboratory Distributed Computing System Laboratory Seoul National Univ. Seoul National Univ. System VMs
Chapter 8Chapter 8System Virtual Machines System Virtual Machines
2005.11.92005.11.9
Dong In ShinDong In Shin
Distributed Computing System LaboratoryDistributed Computing System Laboratory
Seoul National Univ. Seoul National Univ.
System VMs
System VMs
Contents
Performance Enhancement of System VMs1
Case Study : Vmware Virtual Platform2
Case Study : The Intel VT-x Technology3
** Case Study : Xen 4
Performance Enhancement Performance Enhancement of System Virtual Machinesof System Virtual Machines
System VMs
Reasons for Performance Degradation
SetupEmulation
Some guest instructions need to be emulated (usually via interpretation) by the VMM.
Interrupt handlingState savingBookkeeping
Ex. The accounting of time charged to a userTime elongation
System VMs
Instruction Emulation Assists
The VMM emulates the privilege instruction using a routine whose operation depends on whether the virtual machine is supposed to be executing in system mode or in user mode. Hardware assist for checking the state and performing
the actions.
System VMs
Virtual Machine Monitor Assists
Context switch Using hardware to save and restore registers
Decoding of privileged instructions Hardware assists, such as decoding the privileged
instructions. Virtual interval timer
Decrementing the virtual counter by some amount estimated by the VMM from the amount that the real timer decrements.
Adding to the instruction set A number of new instructions that are not a part of the
ISA of the machine.
System VMs
Improving Performance of the Guest System
Non-paged mode The guest OS disables dynamic address
translation and defines its real address space to be as large as the largest virtual address space. Page frames are mapped to fixed real pages.
The guest OS no longer has to exercise demand paging.
No double pagingdouble paging No potential conflict in paging decisions by
the guest OS system and the VMM
System VMs
Double Paging
Two independent layers of paging will interact, perform poorly.
Guest OS incorrectly believe a page to be in physical memory ( green/gold pages )
VMM believes an unneeded page is still in use (teal pages)Guest evicts a page despite available physical memory (red pages)
System VMs
Pseudo-page-fault handling
A page fault in a VM system A page fault in some VM’s page table A page fault of VMM’s page table
• Pseudo page-fault handling Process
Initialize page-in operation from backing store. Triggers guest ‘pseudo page fault’. Guest OS suspends guest’s user process. VMM does not suspend the guest.
On completion of page-in operation VMM calls guest pseudo page fault handler again Guest OS handler wakes up blocked user process.
System VMs
The others…
Spool files Without any special mechanism, VMM should
intercept the I/O commands and decipher that the virtual machines are simultaneously attempting to send a job to the I/O devices .
Handshaking allows the VMM picks up the spool file and continues to merge this file into its own buffer.
Inter-virtual-machine communication Communication between two physical machines
involves the processing of message packets through several layers at the sender/receiver side
This process can be streamlines, simplified, and made faster if the two machines are virtual machines on the same host platform.
System VMs
Specialized Systems
Virtual-equals-real (V=R) virtual machine The host address space representing the guest real memory is
mapped one-to-one to the host real memory address space. Shadow-table bypass assist
The guest page tables can point directly to physical addresses if the dynamic address translation hardware is allowed to manipulate the guest page tables.
Preferred-machine assist Allow a guest OS system to operate in system mode rather than
user mode. Segment sharing
Sharing the code segments of the operating system among the virtual machines, provided the operating system code is written in a reentrance manner.
System VMs
Generalized Support for Virtual Machines
Interpretive Execution Facility (IEF) The processor directly executes most of the functions
of the virtual machine in hardware. An extreme case of a VM assist.
Interpretive Execution Entry and Exit Entry
• Start Interpretive Execution (SIE) : The software give up control to the hardware IEF part and processor enters the interpretive execution mode.
Exit • Host Interrupt • Interception
– Unsupported hardware instructions.– Exception during the execution of interpreted instruction. – Some special case…
System VMs
Interpretive Execution Entry and Exit
VMM Software
SIE
Host interrupthandler
Interpretiveexecution
mode
Entry into interpretive execution mode
Exit for interception
Exit for host interrupt
Emulation
System VMs
Full-virtualization Versus Para-virtualization
Full virtualization Provide total abstraction of the underlying physical system and
creates a complete virtual systems in which the guest operating systems can execute.
No modification is required in the guest OS or application. The guest OS or application is not aware of the virtualized
environment. Advantages
Streamlining the migration of applications and workloads between different physical systems.
Complete isolation of different applications, which make this approach highly secure.
Disadvantages Performance penalty
Microsoft Virtual Server and Vmware ESX Server
System VMs
Full-virtualization Versus Para-virtualization
Para Virtualization The virtualization technique that presents a software interface to
virtual machines that is similar but not identical to that of the underlying hardware.
This techniques require modifications to the guest OS that are running on the VMs.
The guest OSs are aware that they are executing on a VM. Advantages
Near-native performance Disadvantages
Some limitations, including several insecurities such as the guest OS cache data, unauthenticated connections, and so forth.
Xen system
Case Study:Case Study:Vmware Virtual PlatformVmware Virtual Platform
System VMs
Vmware Virtual Platform
A popular virtual machine infrastructure for IA-32-based PCs and server.
An example of a hosted virtual machine system Native virtualization architecture product Vmware
ESX Server This book is limited to the hosted system, Vmware
GSX Server (VMWare2001) Challenges
Difficulties to virtualize efficiently based on IA-32 environment.
The openness of the system architecture. Easy Installation.
System VMs
Vmware’s Hosted Virtual Machine Model
System VMs
Processor Virtualization
Critical Instructions in Intel IA-32 architecture not efficiently virtualizable.
Protection system references Reference the storage protection system, memory system, or
address relocation system. (ex. mov ax, cs ) Sensitive register instructions
Read or change resource-related registers and memory locations (ex. POPF)
Problems The sensitive instructions executed in user mode do not
executed as correct as we expected unless the instruction is emulated.
Solutions The VM monitor substitutes the instruction with another set of
instruction and emulates the action of the original code.
System VMs
Input/Output Virtualization
The PC platform supports many more devices and types of devices than any other platform.
Emulation in VMMonitor Converting the in and out I/O to new I/O instructions. Requires some knowledge of the device interfaces.
New Capability for Devices Through Abstraction Layer VMApp’s ability to insert a layer of abstraction above
the physical device. Advantages
Reduce performance losses due to virtualization.• Ex) Virtual Ethernet switch between a virtual NIC and a
physical NIC.
System VMs
Using the Services of the Host Operating System
The request is converted into a host OS call.
Advantages No limitations for VMM’s access of the host
OS’s I/O features. Running the Performance-Critical
applications
System VMs
Memory Virtualization
Paging requests of the guest OS Not directly intercepted by the VMM, but
converted into disk read/writes. VMMonitor translates it to requests on the
host OS throught VMApp. Page replacement policy of host OS
The host could replace the critical pages of VM system in the competition with other host applications.
VMDriver’s critical pages pinning for virtual memory system.
System VMs
Vmware ESX Server
Native VM A thin software layer designed to multiplex
hardware resources among virtual machines Providing higher I/O performance and
complete control over resource managementFull Virtualization
For servers running multiple instances of unmodified operating systems
System VMs
Page Replacement Issues
Problem of double paging Unintended interactions with native memory
management policies between in guest operating systems and host system.
Ballooning Reclaims the pages considered least valuable by the
operating system running in a virtual machine. Small balloon module loaded into the guest OS as a
pseudo-device driver or kernel service. Module communicates with ESX server via a private
channel.
System VMs
Ballooning in VMware ESX Server
Inflating a balloon When the server wants to
reclaim memory Driver allocate pinned
physical pages within the VM
Increase memory pressure in the guest OS, reclaim space to satisfy the driver allocation request
Driver communicates the physical page number for each allocated page to ESX server
Deflating Frees up memory for
general use within the guest OS
System VMs
Virtualizing I/O Devices on VMware Workstation
Supported virtual devices of VMware PS/2 keyboard, PS/2 mouse, floppy drive, IDE controllers with
ATA disks and ATAPI CD-ROMs, a Soundblaster 16 sound card, serial and parallel ports, virtual BusLogic SCSI controllers, AMD PCNet Ethernet adapters, and an SVGA video controller.
Procedures Intercept I/O operations issued by the guest OS. ( IA-32 IN and
OUT ) Emulated either in the VMM or the VMApp.
Drawbacks Virtualizing I/O devices can incur overhead from world switches
between the VMM and the host Handling the privileged instructions used to communicate with
the hardware
Case Study:Case Study:The Intel VT-x (Vanderpool) The Intel VT-x (Vanderpool) TechnologyTechnology
System VMs
Overview
VT-x (Vanderpool) technology for IA-32 processors enhance the performance VM implementation through
hardware enhancements of the processor. Main Feature
The inclusion of the new VMX mode of operation (VMX root/non-root operation)
VMX root operation• Fully privileged, intended for VM monitor New instructions –
VMX instructions VMX non-root operation
• Not fully privileged, intended for guest software• Reduces Guest SW privilege w/o relying on rings
System VMs
Technological Overview
Root Mode(VMM)
Non-Root(VM1)
Non-Root(VM2)
RegularMode
RegularMode
vmxonvmlaunch
VM1vmlaunch
VM2vmresume
VM2vmresume
VM2vmresume
VM1vmxoff
VM1exits
VM2exits
VM2exits
VM2exits
VM1exits
System VMs
IA-32Operation
VT-x Operations
Ring 0
Ring 3VMX RootOperation
VMX Non-rootOperation
. . .Ring 0
Ring 3
VM 1
Ring 0
Ring 3
VM 2
Ring 0
Ring 3
VM n
VMXONVMLAUNCHVMRESUME
VM Exit VMCS2
VMCSn
VMCS1
System VMs
Capabilities of the Technology
A Key aspect The elimination of the need to run all guest code in
the user mode. Maintenance of state information
Major source of overhead in a software-based solution
Hardware technique that allows all of the state-holding data elements to be mapped to their native structures.
VMCS (Virtual Machine Control Structure)• Hardware implementation take over the tasks of loading and
unloading the state from their physical locations.
System VMs
Virtual Machine Control Structure (VMCS)
Control Structures in Memory Only one VMCS active per virtual processor at any
given time VMCS Payload
VM execution, VM exit, and VM entry controls Guest and host state VM-exit information fields
** Case Study:** Case Study:Xen VirtualizationXen Virtualization
System VMs
Xen Design Principle
Support for unmodified application binaries is essential.
Supporting full multi-application operating system is important.
Paravirtualization is necessary to obtain high performance and strong resource isolation.
System VMs
Xen Features
Secure isolation between VMsResource Control and QoSOnly guest kernel needs to be ported
All user-level apps and libraries run unmodified.
Linux 2.4/2.6 , NetBSD, FreeBSD, WinXPExecution performance is close to
native. Live Migration of VMs between Xen
nodes.
System VMs
Xen 3.0 Architecture
System VMs
Xen para-virtualization
Arch Xen/X86 , replace privileged instructions with Xen hypercalls.
Hypercalls Notifications are delivered to domains from Xen using
an asynchronous event mechanism Modify OS to understand virtualized
environment Wall-clock time vs. virtual processor time
• Xen provides both types of alarm timer Expose real resource availability
Xen Hypervisor Additional protection domain between guest OSes
and I/O devices.
System VMs
X86 Processor Virtualization
Xen runs in ring 0 (most privileged) Ring 1,2 for guest OS, 3 for user-space Xen lives in top of 64MB of linear address
space. Segmentation used to protect Xen as switching page
tables too slow on standard X86 Hypercalls jump to Xen in ring 0 Guest OS may install ‘fast trap’ handler MMU-virtualization : shadow vs. direct-
mode
System VMs
Para-virtualizing the MMU
Guest OS allocate and manage own page-tables Hypercalls to change PageTable base.
Xen Hypervisor is responsible for trapping accesses to the virtual page table, validating updates and propagating changes.
Xen must validate page table updates before use Updates may be queued and batch processed
Validation rules applied to each PTE Guest may only map pages it owns
XenoLinux implements a balloon driver Adjust a domain’s memory usage by passing memory
pages back and forth between Xen and XenoLinux
System VMs
MMU virtualization
System VMs
Writable Page Tables
System VMs
I/O Architecture
Asynchronous buffer descriptor rings Using shared-memory
Xen I/O-Spaces delegate guest Oses protected access to specified h/w devices
The guest OS passes buffer information vertically through the system.
Xen performs validation checks. Xen supports a lightweight event-delivery
mechanism which is userd for sending asynchronous notifications to a domain.
System VMs
Data Transfer : I/O Descriptor Rings
System VMs
Device Channel Interface
System VMs
Performance