Virtualization in Linux Atul Bansal Manish Pal Pulkit Gambhir
Virtualization in Linux
Atul BansalManish PalPulkit Gambhir
Virtualization in a nut-shell
Virtualization : Running multiple machines on a single hardware
“Real” hardware invisible to OS OS only sees an abstracted out
picture Only Virtual Machine Monitor (VMM)
talks to hardware
More formally …
1. H/W & S/W partitioning2. Time-sharing3. Partial or complete machine simulation4. Emulation …
A framework for dividing the resources of a machine into multiple execution environments by using techniques such as :
In general, we implement an M-N mapping (M:real resources N:virtual resources). Eg. Multitasking (1-N), Cluster Computing (M-1)
Motivations
Save on costs :Many servers 1 MachineRunning legacy software
Improve on security :Protection of data by placing on
separate virtual machines Development and debugging platform
More motivations
Hardware independence of code (Java VM)
Compatibility issues tackled Server migration is eased Error and attack containment Dynamic resource sharing It looks cool too….
Issues in virtualization
Some interfaces not designed with virtualization in mind (Ex. Processor privileges) VMM needs to call all privileged instructions
Need for extra level of segmentation of memory (between virtual machines) Done entirely by VMM, guest OS’s only see
an abstraction of page table, not the real one
Issues in virtualization
Resource sharingMap all VM requests to same network
card, same DMA controller etc. Design and management of
communication between different virtual machinesNeed to show abstracted hardware
which has no physical equivalent (emulation)
And despite all that ….
Need transparency and near real machine like performance !
An extremely hard task (bochs is a very good example of less than perfect performance)
Example : VMWare architecture
Case Study : XEN
Virtual Architecture
Basic virtual Architecture of Xen CPU state Exception Interrupt handling Time Memory Devices
CPU State
Xen provides each guest OS with virtual cpu’s (only 1 real cpu).
All privileged cpu state are handled by Xen.
Guest OSes are not permitted to perform privileged operations.
hypercalls interface provided to guest OS to execute privileged operations on the cpu through Xen
Hypercalls
Analogous to system calls provided by any OS except that handlers of software interrupts vectors to entry point within Xen.
Even to set up Interrupt Vector Table, the OS must invoke Xen hypercalls.
Basically any priviliged operation on CPU is performed through a hypercall to Xen.
Virtual IDT
A virtual IDT is provided to guest OS for setting up interrupt vector table.
A guest OS can submit a table of trap handlers to Xen via the set_trap_table hypercall.
The exception stack frame presented to a virtual trap handler is identical to its native equivalent.
Interrupt Handling
Interrupts are virtualized by mapping them to event channels
Get delivered to the guest OS using a callback supplied via the set callbacks hypercall.
Guest OS can map these events onto its standard interrupt dispatch mechanisms.
Xen is responsible for determining the guest OS that will handle each physical interrupt source.
Time
Time is important in virtualization as guest OS need to be aware of ‘real time’ and ‘virtual time’ (time of execution).
Xen exports timestamps for system time and wall-clock time to guest operating systems through a shared page of memory.
Time Consistency
All time stamps need to be updated and read atomically .
Xen stores a version number in the shared info page, which is incremented before and after updating the timestamps.
A guest can be sure that it read a consistent state by checking the two version numbers are equal.
Event Channels
Event channels are the basic primitive provided by Xen for event notifications.
Xen equivalent of a hardware interrupt. Stores one bit of information, the event
of interest is signaled by transitioning this bit from 0 to 1.
Notifications are received by a guest via an up-call from Xen,
Event Channels (Implementation) The kernel shared info page (shared_info_t)
contains two bitfields for event channelsunsigned long evtchn_pending[…..]; unsigned long evtchn_mask[…..];
These two specify, respectively, if there is an event pending (evtchn_pending) and if the event channel is masked or not.
For masked channels, no events will be delivered.
Virtual CPU Setup
Any guest OS needs to setup a virtual CPU on which it executes.
Includes installing vector table on virtual IDT for handling interrupts,page faults etc
Guest OS must setup a pair of hypervisor callbacks (notification and entry points for XEN)
Hypercalls for CPU Setup
set callbacks(………………………..).The above hypercall allows a guest OS to setup the hypervisor callbacks.set trap table(trap info t *table)The above hypercall allows a guest OS to setup its IDT.
A further hypercall is provided for the management of virtual CPUs:
vcpu op(……..)This hypercall can be used to bootstrap VCPUs, to bring them up and down and to test their current status.
Start of Day
The start-of-day environment for guest operating systems is different to that provided by the underlying hardware.
Processor is already executing in protected mode with paging enabled.
Domain 0 is created and booted by Xen itself.
Start of Day For all domains other that dom0 , the
analogue of the boot-loader is the domain builder.
Domain builder is a user-space software running in domain 0.
The domain builder is responsible for building the initial page tables for a domain and loading its kernel image at the appropriate virtual address.
XEN Scheduling
Similiar to traditional Linux schedulers that divide CPU time for userland processes, XEN schedules resources between VMs.
It is like context switching between kernels
Xen includes kernel boot time options for scheduling.
Scheduling Algorithms Atropos
soft real time schedulerguarantees absolute CPU shares
Round RobinCharacterized by a “quantum” of time
Borrowed Virtual TimeProportional fair shares of CPU times“Penalizes” domains that block often ctx_allow : like the “quantum” above
Scheduling Algorithms sEDF
Provides weighted CPU sharingUses real time algorithms to ensure
time guaranteesUses weights as well as slices and
periods for scheduling and sharing
System Calls and Scheduling
Some Scheduling System Calls
*nice( )getpriority( )setpriority( )
sched_getscheduler( )sched_setscheduler( )
sched_getparam( )sched_setparam( )
sched_yield( )sched_get_ priority_min( )sched_get_ priority_max( )
sched_rr_get_interval( )
Memory management Xen allocates physical memory to the domains
on a page granularity Domains may receive non-contiguous physical
memory. So xen makes a distinction between machine
memory and pseudo-physical memory. Machine memory refers to the entire amount
of memory installed in the machine. Pseudo-physical memory, on the other
hand, is a per-domain abstraction.
Memory management
Xen maintains a globally readable machine-to-physical table
Each domain is also supplied with a physical-to-machine table which performs the inverse mapping.
Architecture dependent code in guest operating systems can then use the two tables to provide the abstraction of pseudo-physical memory.
Page Table Updates Read-only access given to page tables Guest OS must explicitly request any
modifications (through hypercalls). Xen validates all such requests and
only applies updates that it deems safe This is necessary to prevent domains
from adding arbitrary mappings to their page tables.
Writable Page Tables
Guest OS’s may request writable page tables as well.
Xen must still validate modifications to ensure secure partitioning.
Xen thus traps write attempts to certain memory pages.
Handling the trap
Xen temporarily allows write access to that page while at the same time disconnecting it from the page table that is currently in use.
The newly-updated entries cannot be used by the MMU until Xen revalidates and reconnects the page.
Reconnection occurs automatically later in a number of situations. e.g when the domain is preempted.
Shadow Page Tables
Another type of page table Guest OS uses a independent copy of
page tables Unknown to the hardware Xen propagates changes made to the
guest's tables to the real ones, and vice versa.
VM assists
Xen provides a number of “assists” for guest memory management .
Hypercall used:vm assist(unsigned int cmd, unsigned int
type); cmd parameter describes the action to be
taken type parameter describes the kind of assist
that is being referred to.
Conclusions
Virtualization is a very exciting area Implementation issues still exist We are still moving toward real
machine like performance With hardware supported virtualization
and multi-core, multi-threaded hardware; things are now looking very bright !
A quote to end it
Would PhD virtualization be when several people get a PhD but only one is doing the work? :
JoshTriplett on Xen IRC