System Virtual Machines -Overview Presented by Jongpil Lee
System Virtual Machines-Overview
Presented by
Jongpil Lee
Contents Key Concepts Resource Virtualization – Processors Resource Virtualization – Memory Resource Virtualization – Input/Output
System Virtual Machines A system VM environment is capable of supporting multiple system imag
es simultaneously, each running its own operating system and associated application programs
Real resources of the host platform are shared among the guest system with the virtual machine monitor(VMM) The VMM manages the allocation of , and access to, the hardware resource
of host platform
LinuxApplications
Linux
VirtualIntel IA-32
WindowsApplications
Windows
VirtualIntel IA-32
SolarisApplications
Solaris
VirtualIntel IA-32
Virtual Machine Monitor(VMM)
Intel IA-32 Hardware
Key Concepts(1) Outward Appearance
CPU
TerminalController
Disk Memory NetworkController
Printer
To network
Dedicated touser1
Dedicated touser2
speaker Keyboard CD-Drive
Display Mouse
speaker Keyboard CD-Drive
Display Mouse
SharedHardware
Key Concepts(2) State Management
Processor
Register BlockPointer
VMM Memory
Register valuesFor VM2
Register valuesFor VM1
Register valuesFor VM3
Load register block pointerTo point to VM’s registersIn VMM memory
Load program counter toPoint to VM program andStart execution
.
.Load temp <- reg_pointer, index(A)Store reg_pointer, index(B) <- temp
.
.
Copy register state from VMM memory
Load program counter toPoint to VM program andStart execution
.
.Mov reg A -> reg B
.
.Copy register state from Processor back to system memory
Processor
VMM Memory
Register valuesFor VM2
Register valuesFor VM1
Register valuesFor VM3
VMM copies registervalues when VM is
activatedProcessorRegister
VMM changespointer when
VM is activated
Indirection
Copying
Key Concepts(3) Resource Control The VMM maintain overall control of all the hardware resources Interval timer interrupt
Instead of allowing the operating system in a virtual machine to field the timer interrupt, the VMM first handles the interrupt itself
First VM Active Next VM ActiveNext VM Active
TimerInterruptoccurs
VMM savesarchitected stateOf running VM
VMM determinesNext VM to be
activated
VMM sets timerinterval and
enables interrupts
VMM restoresarchitected state
For next VM
VMM sets PC to timerInterrupt handler of
OS in next VM
Key Concepts(4) Native and Hosted Virtual Machine A native VM system
The VMM opeartes in a privilege mode higher than the mode of the guest virtual machines
The privilege level of the guest OS is emulated by the VMM A Hosted VM system
a virtual machine system is installed on a host platform that is already running an existing OS
The VMM utilizes the functions already available on the host OS to control and manage resources desired by each of the virtual machine
TraditionalUniprocessor
system
OS
Hardware
Application
NativeVM system
VMM
Hardware
Guest OS
Guest Apps
User-Modehosted
VM system
Host OS
Hardware
VMM
Guest Apps
Guest OS
VMM
Dual-modehosted
VM system
Host OS
Hardware
Guest OS
Guest Apps
Nonprivilegedmodes
Privilegedmodes
Key Concepts(5)IBM VM/370 The virtual machine monitor of VM/370
the control program(CP) A single-user operating system
The conversational monitor system(CMS)
Resource Virtualization - Processor The key aspect of virtualizing a processor
the execution of the guest instructions, including both system-level and user-level instruction
Processor virtualization method Emulation
Interpretation, binary translation ( described in Chapter 2 ) Direct native execution
Only if the ISA of the host is identical to the ISA of the guest
Trap For virtualizable ISA, a trap occurs naturally when an instruction needs to be
emulated the trap handler jumps to an appropriate interpreter routine, interprets the single in
struction, and returns control back to the original program
Resource Virtualization - Processor Conditions for ISA Virtualizability(1) We restrict the discussion here to native system VMs In a native system VM, the VMM runs in system mode, and all other soft
ware runs in user mode The VMM keeps track of the intended mode of operation of a guest virtu
al machine But The VMM sets the actual native hardware mode to user mode whenever
executing instructions from the guest virtual machine
Resource Virtualization – Processor Conditions for ISA Virtualizability(2) The machine being virtualized is modeled as a 4-tuple
S = < E, M, P, R > E : the executable storage M : the mode of operation P : the program counter R : the memory relocation bounds register
A memory trap occurs if the address accessed by a program falls outside the bounds indicated by R
A privileged instruction is defined as one that traps if the machine is in user mode and does not trap if the machine is in system mode Load PSW(LPSW, IBM System/370)
Load the processor status word (PSW) from a location in memory if the processor is in system mode. If it is not in system mode, the machine traps
Set CPU Timer(SPT, IBM System/370) Replaces the CPU interval timer with the contents of a location in memory if the C
PU is in system mode and traps if it is not
Resource Virtualization – Processor Conditions for ISA Virtualizability(3) To specify instructions that interact with hardware, two categories of spe
cial instructions are defined Control-sensitive instruction
Attempt to change the configuration of resources in the system Ex) Load PSW, Set CPU Timer
Behavior-sensitive instruction Behavior or results produced depend on the configuration of resource Ex) Load Real Address(LRA)
takes a virtual address, translates it, saves the corresponding real address in a specified general-purpose register
The behavior of this instruction depends on the state(mapping) of the real memory resource
Ex) Pop Stack into Flags Register(POPF) pops the flag registers from a stack held in memory In user mode, this instrution overwrites all flags except the interrupt-enable flag For the interrupt-enable flag, the instruction acts as a no-op when executed in user mode
Innocuous instruction
Resource Virtualization – Processor Conditions for ISA Virtualizability(4)
dispatcher
Allocator
Interpreterroutine1
Interpreterroutine1
Interpreterroutine1
InstructionTrap occurs
These instructions desire to change
machine reosurce,e.g., load relocation
bounds register
These instructions do notchange machine resources
But access privilegedresource, e.g., IN, OUT,
Write TLB
Privilegedinstruction
Privilegedinstruction
Privilegedinstruction
Privilegedinstruction
Component of a Virtual Machine Monitor1. Dispatcher2. Allocator3. Interpreter routines
Resource Virtualization – Processor Conditions for ISA Virtualizability(5) The theorem regarding (efficient) VMM construction
Theorem 1 A virtual machine monitor may be constructed if the set of sensitive instruction is a
subset of the set of privileged instructions An efficient virtual machine implementation can be constructed if instructions that
could interfere with the functioning of the VMM always trap in the user mode
Resource Virtualization – Processor Conditions for ISA Virtualizability(6) The VMM interprets a sensitive instruction according to the prevailing
status of the virtual system resources and the state of the virtual machine
Guset OS code in VM(user mode)
VMM code(privileged mode)
Privileged instruction(LPSW)
Next instruction(target of LPSW)
Dispatcher
LPSW Routine:Change mode to priilegedCheck privilege level in VMEmulate instructionCompute targetRestore mode to userJump to target
Resource Virtualization – Processor Conditions for ISA Virtualizability(7) Interpreting the SPT interuction
The VMM examines the contents of the location to be loaded into the CPU timer If( t < T ) t is loaded, else T is loaded
t : the content of the location T : the time remaining from the allocated time for the virtual machine itself
Meanwhile, it keeps the time difference( t - T ) in an internal table so that this time can be restored when the guest VM is again activated
Resource Virtualization – Processor Recursive Virtualization The concept of running the virtual machine system on a copy itself Two effects that usually restrict the ability to create an efficient recursivel
y virtualizable system Theorem 2
A conventional third-generation computer is recursively virtualizable if (a) it is virtualizable and (b) a VMM without any timing dependences can be constructed for it
hardware
VMM
VirtualMachine
2nd-level VMM
VirtualMachine
VirtualMachine
VirtualMachine
PrivilegedMode
NonprivilegedMode
Resource Virtualization – Processor Handling Problem Instructions The POPF instruction is sensitive but not privileged
Critical instruction ( sensitive but not privileged ) It does not generate a trap in user mode It violate the virtualizability condition of Theorem 1
An additional set of steps must be taken in order to implement a system virtual machine( with possible loss of some efficiency ) It is possible for a VMM intercepts POPF and other critical instructions if all g
uest software were interpreted instruction by instruction Techniques related to those described in Chapters 2 and 3 can be used to re
duce the inefficiency
Resource Virtualization – Processor Handling Problem Instructions
Scanner andPatcher
Code patch forDiscovered
Critical instruction
Control transfer,e.g., trap
VMM
Resource Virtualization – Processor Patching of Critical Instructions One way to discover critical instructions
The VMM takes control at the head of each guest basic block and scan instructions in sequence until the end of the basic block is reached If a critical instruction is found, it is replaced with a trap to the VMM Another trap back to the VMM is placed at the end of the basic block
To reduce overhead, the trap at the end of a scanned basic block can be replaced by the original branch or jump instruction
Resource Virtualization – Processor Caching Emulation Code The overhead of VMM interpretation can become a problem when the
frequency of sensitive instructions requiring interpretation is high
TranslationTable
Block 1
Block 2
Block 3
Code Cache
SpecializedEmualtion Routines
Code sectionEmulated in code
cache
Two criticalInstructions combined
Into a single block
Block 1
Block 3
Block 2
ControlTransfer,e.g., trap
Patched Program VMM
Resource Virtualization – MemoryVirtual Memory Support in a System Virtual machine Environment(1)
Each of the guest VMs has its own set of virtual memory tables Address translation in each of the guest VMs transforms address in its vi
rtual address space to locations in real memory Real memory : a guest VM’s illusion of physical memory Physical memory : the hardware memory
A guest’s real memory address must undergo a further mapping to determine the address in physical memory of the host hardware
VMM maintains a real map table mapping the real pages to physical pages
Resource Virtualization – MemoryVirtual Memory Support in a System Virtual machine Environment(2)
Virtual memory of Program 1 on VM1
1000
2000
1500
3000
5000
1000
4000
500
3000
1000
4000
500
1000
3000
Real Memory ofVM1
Virtual memory of Program 2 on VM1
Real Memory ofVM2
Virtual memory of Program 3 on VM2
Not mappedto physical
memory
Physical Memoryof System
Virtual pageReal page--- ---
1000 Not mapped--- ---
4000 3000--- ---
Page Table for Program 2
Virtual pageReal page--- ---
1000 500--- ---
4000 3000--- ---
Page Table for Program 3
Virtual pageReal page--- ---
1000 5000--- ---
2000 1500--- ---
Page Table for Program 1VM1 Real
pageReal page
--- ---3000 Not mapped
--- ---5000 1000
--- ---Real Map Table for VM1
--- ---1500 500
VM2 Realpage
Real page
--- ---3000 Not mapped
--- ---Real Map Table for VM2
--- ---500 3000
Not mapped
Resource Virtualization – MemoryVirtualizing Architected Page Tables(1) The virtual-to-physical mapping is kept by the VMM in shadow page tabl
es, one for each of the guest VMs These tables are the ones actually used by hardware to translate virtual addr
esses and to keep the TLB up-to-date To make this method work, the page table pointer register is virtualized
Virtual page Physical page--- ---
1000 1000--- ---
2000 500--- ---
Shadow Page Table forProgram 1 on VM1
Virtual page Physical page--- ---
1000 1000--- ---
2000 500--- ---
Shadow Page Table forProgram 1 on VM1
Virtual page Physical page--- ---
1000 1000--- ---
2000 500--- ---
Shadow Page Table forProgram 1 on VM1
Page table pointerProgram 1 on VM1 is
Currently active
Resource Virtualization – MemoryVirtualizing Architected Page Tables(2) Page fault handling
If the page is mapped in the virtual table of the guest OS The VMM has moved the accessed real page to its own swap space The VMM brings the real page back into physical memory The VMM updates the real map table and the affected shadow table(s)
If the page is not mapped in the guest The VMM transfers control to the trap handler of the guest, indicating a page falut The guest OS then issues instruction to modify its page table The VMM intercepts these request The VMM updates the page table and also updates the mapping in the appropriat
e shadow page table
Resource Virtualization – Memory Virtualizing an Architected TLB To virtualize the TLB, the VMM maintains a copy of each guest’s TLB co
ntents and also manages the real TLB The real TLB management
The VMM rewrite the TLB whenever a guest VM is activated The VMM translates the real address in virtual TLB to physical address in the phy
sical TLB The VMM copies the VM’s virtual TLB entries into the physical TLB A fairly high overhead
The VMM leverage the address space identifiers(ASIDs)
Virtual TLB of VM1
Virtualpage
Realpage
--- ---2000 1500
--- ---4000 3000
--- ---
--- ---1000 5000
ASID
---3---3---7---
Virtual TLB of VM2
Virtualpage
Realpage
--- ------ ------ ------ ------ ---
--- ---1000 3000
ASID
---3---------------
ASID Mapping:Prog. 1 – ASID 3Prog. 2 –ASID 7
ASID Mapping:Prog. 1 – ASID 3
Virtual TLBs ASID Map Table Real TLB
Virtual ASID
RealASID
--- ---VM1:3 9
--- ---VM1:7 ---
--- ---VM2:3 4
Virtualpage
Realpage
--- ---1000 3000
--- ---2000 500
--- ---
--- ---1000 1000
ASID
---9---4---9---
Resource Virtualization – Input/OutputVirtualizing Device Dedicated Devices
Some I/O device is dedicated to a particular guest VM or at least are switched from one guest to another on a very long time scale
The device itself does not necessarily have to be virtualized Requests to and from the device could theoretically bypass the VMM and go
directly to the guest operating system
Partitioned Device A very large disk, for example, can be partitioned into several smaller virtual
disk that are then made available to the virtual machine as dedicated devices
Resource Virtualization – Input/OutputVirtualizing Device Shared Devices
Some device, such as a network adapter, can be shared among a number of guest VMs at a fine time granularity
Each guest may have its own virtual state related to usage of the device, e.g., a virtual network address. This state information is maintained by the VMM for each guest VM
Nonexistent Physical Device Virtual devices “attached” to a virtual machine for which there is no correspo
nding physical device For example, a network adapter that is used for communicating with other vir
tual machines on the same platform
Resource Virtualization – Input/OutputVirtualizing Device Spooled Device
Virtualization of spooled device can be performed by using a two-level spool table approach
Virtual Machine 1 Spool Table
ProgramABCD
StatusPrinted
CompletedRunning
Completed
Location1000200030004000
Real loc11000120001300014000
Size400200200500
Virtual Machine 2 Spool Table
Size400800
Real loc2100022000
Location10002000
StatusRunning
Completed
ProgramPQ
VMM Spool Table
VM1211
StatusAQBD
StatusPrintedPrintingWaitingWaiting
Real loc30000310003180030400
Size400800200500
10000
20000
30000
Resource Virtualization – Input/OutputVirtualizing I/O Activity
Application
Hardware
Operating system
VMM I/O drivers
System calls
Physical memory and I/O operations
driver calls
An application program makes device-independent I/O request
The Operating system converts the device-independent request into calls to device driver routines
A device driver takes care of device-specific aspects of performing an I/O transaction
The VMM can intercept a guest’s I/O action and convert it from a virtual device action to a real device action at any of the three interface The system call interface The device driver interface The operational-level interface
Resource Virtualization – Input/OutputVirtualizing I/O Activity Virtualizing at the I/O operation Level
The privileged nature of the I/O operations make them easy for the VMM to intercept because they trap in user mode
Virtualizing at the Device Driver Level If the VMM can intercept the call to the virtual device driver, it can convert th
e virtual device information to the corresponding physical device and redirect the call to a driver program for the physical device
It requires that the VMM developer have some knowledge of the guest operating system and its internal device driver interfaces
Virtualizing at the System call Level The virtualization process could be made more efficient by intercepting the in
itial I/O request at the OS interface, the ABI The entire I/O action could be done by the VMM
Resource Virtualization – Input/Output Input/Output Virtualization and Hosted Virtual Machine
An I/O request from a guest virtual machine is converted by the native-mode portion of the VMM into a user application request made to the host
An advantage of a hosted virtual machine It is not necessary to provide device drivers in the VMM the actual device drivers do not have to be incorporated as part of the VMM
A component that form a dual mode hosted virtual machine system VMM-n(native)
Intercepts traps due to privileged instructions or patched critical instructions encountered in a virtual machine
VMM-u(user) Makes resource requests to the host OS
VMM-d(driver) Provide a means for communication between the other two components