Virtualization - TU Dresdenos.inf.tu-dresden.de/Studium/KMB/WS2008/08-Virtualization.pdf · TU Dresden, 2008-12-02 MOS - Virtualization Slide 12 von 46 Problems with x86 virtualization
Post on 11-Jun-2020
1 Views
Preview:
Transcript
Faculty of Computer Science Institute for System Architecture, Operating Systems Group
Virtualization
Dresden, 2008-12-02
Stefan Kalkowski
TU Dresden, 2008-12-02 MOS - Virtualization Slide 2 von 46
So Far ...
● Basics● Introduction● Threads & synchronization● Memory
● Real-time● Device Drivers● Resource Management
TU Dresden, 2008-12-02 MOS - Virtualization Slide 3 von 46
Today: Virtualization
● Introduction● Motivation & classification, flavors● NOVA – a μ-hypervisor● L4Linux: Para-virtualization on top of L4
● Architecture● Address space layout● Scenarios● Freezing
TU Dresden, 2008-12-02 MOS - Virtualization Slide 4 von 46
One possible definition ...
Virtualization is the creation of a virtual orlogical (rather than actual) version of
something, such as an operating system,a hardware platform, a storage device
or network resources
TU Dresden, 2008-12-02 MOS - Virtualization Slide 5 von 46
Virtualization – a hype
● A lot of interest in the research community within the last years, e.g.:
● SOSP 03: Xen and the Art of Virtualization● EuroSys 07: a whole session about
virtualization
● Many new virtualization products:● QEmu, Virtualbox, Intel-VT, AMD-V, KVM
● Further increasing demand:● Vmware: from 240 to 6300 employees within
the last few years
TU Dresden, 2008-12-02 MOS - Virtualization Slide 6 von 46
Virtualization - a new idea?
● Originates in IBM's CP/CMS series used on System/3xx mainframes (starting ~1964)
● Control Program - VMM● Cambridge Monitor System
● Guest OS
● Memory protection● SIE instruction (VM mode)● CP encodes much of the guest privileged state
in a hardware-defined format● IBM's first virtual memory system
TU Dresden, 2008-12-02 MOS - Virtualization Slide 7 von 46
Virtualization - Motivation
● Optimize utilization● Maintainability● Migration
● reliability: hardware failure● load balancing● economic & ecological reasons: temporary
shutdown of a system
● Isolation● Support of different OS ABIs
TU Dresden, 2008-12-02 MOS - Virtualization Slide 8 von 46
Classification help
● Virtualization - an overloaded term● Some classification criteria:
● Objective target: hardware, OS API or ABI ?● Emulation vs. virtualization: do we have to
interpret some or all instructions ? Binary vs. byte code interpretation (e.g.: JVM)
● Can we modify the target software ? (e.g. using para-virtualization techniques)
TU Dresden, 2008-12-02 MOS - Virtualization Slide 9 von 46
Support of different OS personalities
● GNU Hurd example: rebuild the POSIX API
Mach Kernel Dev
Auth Server Exec Server Process ServerName Server
File Server
UserProcess
LibC
Password Server Network Protocol Server
DevDevDev
TU Dresden, 2008-12-02 MOS - Virtualization Slide 10 von 46
Reimplementation of the OS interface
● Used to integrate a bunch of existing software to other respectively newly created OSes
● When copying the API of an OS, target software needs to be re-linked
● In contrast to that, ABI emulation can run unmodified binaries e.g.: Wine
● Disadvantage of both approaches:● Great effort● Shooting at a moving target
TU Dresden, 2008-12-02 MOS - Virtualization Slide 11 von 46
Virtualize the hardware
● Instead of emulating the OS API or ABI, take the underlying platform
● Full emulation:● Any platform can be used● QEMU emulates x86, ARM, SPARC, PowerPC ...
● Virtualization of the underlying platform:● no additional interpretation of code running in
user-mode● processor emulation only for kernel- and real-
mode (x86)● Examples: KQEMU, Vmware, VirtualBox ...
TU Dresden, 2008-12-02 MOS - Virtualization Slide 12 von 46
Problems with x86 virtualization
● Ring-alias problem● Guest OS runs in privilege level > 0
● Address space compression● Part of the guest OS's address space used by
the VMM (e.g. IDT, GDT)
● Some privileged instructions fail silently, e.g.:● Popf: pop stack into EFLAGS register,
causes interrupt handling problems (IF not updated in user-mode)
● Faulting incessantly implies performance loss● kernel entry/exit -> doubled context switch
TU Dresden, 2008-12-02 MOS - Virtualization Slide 13 von 46
Platform virtualization in software
● Guest OS runs natively in less privileged mode
● Privileged instructions fail and are handled by the VMM (trap-and-emulate)
● VMM derives and manages shadow structures from guest's primary structures, e.g.: shadow page tables
● JIT binary translation● Examples: VMWare, QEMU, VirtualBox
TU Dresden, 2008-12-02 MOS - Virtualization Slide 14 von 46
Shadow page tables
● Memory tracing of the page-tables● Write protect it● Decode and emulate guest's page-faults
Host OS
Guest OS
Guest Application
TU Dresden, 2008-12-02 MOS - Virtualization Slide 15 von 46
Hardware enabled virtualization
● Example Intel-VT● root and non-root mode, VM entry and exit● Virtual Machine Control Structure in physical
memory holds information of guest and host state and some additional control information
● VMCS is used to investigate VM exit conditions, e.g.: whether a specific Interrupt can be handled by the guest
➔ Similar to IBM System/3xx
TU Dresden, 2008-12-02 MOS - Virtualization Slide 16 von 46
XEN
TU Dresden, 2008-12-02 MOS - Virtualization Slide 17 von 46
NOVA – μ hypervisor approach
● NOVA OS Virtualization Architecture● Separate hypervisor and VMM(s)
μ hypervisor
Server Server Server VMM
Guest OS Guest OS Guest OS
user
kernel
root
non-root
TU Dresden, 2008-12-02 MOS - Virtualization Slide 18 von 46
NOVA
● Hypervisor manages protection domains: ● address spaces and virtual machines
● Virtual machines have associated virtualization handlers -> the VMMs
● VMMs handle virtualization faults and implement virtual devices
● Splitting functionality of hypervisor and VMM➔ Reduces complexity of hypervisor which runs
security-sensitive Applications beside the VMs
TU Dresden, 2008-12-02 MOS - Virtualization Slide 19 von 46
Paravirtualization
● Modify OS to integrate it in the runtime environment
● L4Linux, Xen (without VT), User Mode Linux● Afterburner (Karlsruhe): modify binary code● Advantages:
● Plug virtualization holes without additional hardware support
● Effort negligible with respect to OS emulation● Cooperation between guest OS (and its
applications) and the non-VM system
TU Dresden, 2008-12-02 MOS - Virtualization Slide 20 von 46
L4Linux: history
● Presented at SOSP '97● based on x86 Linux 2.0 on top of first L4 kernel
● (L4)Linux has evolved over the years● 2.2 supported MIPS and x86● 2.4 first version to run on L4Env● 2.6 uses 'paravirtualization' L4 kernel features
● Recently● Latest Linux release 2.6.27● ARM support● Freeze functionality● SMP
TU Dresden, 2008-12-02 MOS - Virtualization Slide 21 von 46
Linux Architecture
LinuxKernel
Arch-Ind.
Arch-Depend.
Arch-Depend.
Processes Scheduling
IPC
MemoryManagement
Page allocation Address spaces
Swapping
File Systems VFS
File System Impl.
Networking Sockets Protocols
Device Drivers
System-Call Interface
Hardware Access
Application Application Application Applicationuser
kernel
HardwareCPU, Memory, PCI, Devices
TU Dresden, 2008-12-02 MOS - Virtualization Slide 22 von 46
kernel
Linux Architecture
LinuxKernel
Arch-Ind.
Arch-Depend.
Arch-Depend.
Processes Scheduling
IPC
MemoryManagement
Page allocation Address spaces
Swapping
File Systems VFS
File System Impl.
Networking Sockets Protocols
Device Drivers
System-Call Interface
Hardware Access
HardwareCPU, Memory, PCI, Devices, …
Application Application Application Applicationuser
● Architecture dependent part● Small, for x86 about 2% of the kernel● System call interface:
Kernel entry Signal delivery Copy from/to user space
● Hardware access: CPU state and features MMU Interrupt Memory mapped I/O, I/O ports
● Architecture dependent part implements generic interface used by independent part
TU Dresden, 2008-12-02 MOS - Virtualization Slide 23 von 46
Linux Architecture
LinuxKernel
Arch-Ind.
Arch-Depend.
Arch-Depend.
Processes Scheduling
IPC
MemoryManagement
Page allocation Address spaces
Swapping
File Systems VFS
File System Impl.
Networking Sockets Protocols
Device Drivers
System-Call Interface
Hardware Access
HardwareCPU, Memory, PCI, Devices
Application Application Application Applicationuser
kernel
TU Dresden, 2008-12-02 MOS - Virtualization Slide 24 von 46
L4Linux Architecture
LinuxKernel
Arch-Ind.
Arch-Depend.
Arch-Depend.
Processes Scheduling
IPC
MemoryManagement
Page allocation Address spaces
Swapping
File Systems VFS
File System Impl.
Networking Sockets Protocols
Device Drivers
System-Call Interface
Hardware Access
Hardware
Application Application Application Application
user
kernel Fiasco
L4 Task
DMPhys L4IO Console Names
L4 Task L4 Task L4 Task L4 Task
TU Dresden, 2008-12-02 MOS - Virtualization Slide 25 von 46
L4Linux Architecture
● Linux kernel and Linux user processes run each within a single L4 task
● L4/L4Env specific part is implemented as an own architecture: arch/l4 include/asm-l4
● L4/L4Env architecture dependent part itself divides into x86 and ARM specific part
● Most code is reused from x86 resp. ARM specific part
TU Dresden, 2008-12-02 MOS - Virtualization Slide 26 von 46
Linux address space layout
● 0x0 – TASK_SIZE● user part● changes on every
context switch● TASK_SIZE – 0xF...
● kernel part● constant in all
address spaces
● Physical memory mapped beginning at PAGE_OFFSET
0xFFFFFFFF
0xC0000000
0x00000000
UserAddress
Space
KernelAddress
Space
Phys. Memory
vmalloc, kmap, …
Kernel ImagePAGE_OFFSET
Application,Libraries, …
TASK_SIZE
TU Dresden, 2008-12-02 MOS - Virtualization Slide 27 von 46
L4Linux address space layout
0xFFFFFFFF
0xC0000000
0x00000000
UserAddress
Space
KernelAddress
Space
Phys. Memory
vmalloc, kmap, …
Kernel ImagePAGE_OFFSET
Application,Libraries, …
TASK_SIZE
Application,Libraries, …
Guest-phys. Memory
vmalloc, kmap, …
Kernel Image
FiascoMicrokernel
FiascoMicrokernel
0x00000000
0x00000000
PAGE_OFFSET
0xFFFFFFFF
0xFFFFFFFF
0xC0000000
0xC0000000 L4Linux Server
L4Linux UserProcess
TU Dresden, 2008-12-02 MOS - Virtualization Slide 28 von 46
L4Linux address space layout
Application,Libraries, …
Guest-phys. Memory
vmalloc, kmap, …
Kernel Image
FiascoMicrokernel
FiascoMicrokernel
0x00000000
0x00000000
PAGE_OFFSET
0xFFFFFFFF
0xFFFFFFFF
0xC0000000
0xC0000000 L4Linux Server
L4Linux UserProcess
● L4Linux user task● little change● TASK_SIZE – VDSO
● VDSO page● originally in kernel
part of address space● virtual dynamic
shared object for 'vsyscall'
● contains architecture dependent kernel entry code
vdso page0xBFC0000
0TASK_SIZE
TU Dresden, 2008-12-02 MOS - Virtualization Slide 29 von 46
L4Linux: problems to be solved
● L4Linux server has to:● have some basic resources (memory, I/O)● manage page tables of its user processes● handle exceptions from user processes● schedule its tasks
● L4Linux user processes have to:● 'enter' the L4Linux kernel (now in a different
address space)
● Kernel needs information from user processes formerly accessible in the same address space, e.g.: syscall arguments
TU Dresden, 2008-12-02 MOS - Virtualization Slide 30 von 46
Linux address space management
● Architecture-independent part:● General page table management● Implements allocator strategies● Page replacement strategies● Assumes 4-level page table by
architecture-dependent part
● Architecture-dependent part● Set, remove and test entries● TLB handling● Linux for x86 uses 2 level page
tables
Linux Kernel
Hardware
Architecture-DependentPart (i386)
thread_info
Application
MemoryManagement– Page allocation– Address spaces– Swapping
TU Dresden, 2008-12-02 MOS - Virtualization Slide 31 von 46
L4Linux address space management
● L4Linux user processes are actually L4 tasks
● L4Linux server is the pager● Hardware page tables are
managed by L4 kernel● L4Linux page tables are mirrored
● L4Linux uses map/unmap operations
● Adding page table entries is done lazy (pagefault occurs)
Linux Kernel
Hardware
Architecture-DependentPart (i386)
thread_info
Application
MemoryManagement– Page allocation– Address spaces– Swapping
FiascoKernel
TU Dresden, 2008-12-02 MOS - Virtualization Slide 32 von 46
L4Linux address space management
L4Linux server
Fiasco kernel
Loader Roottask DMPhys Names
guest-physicalmemory
Init
EIPPagefault
1)inspect pagetable2)if not found, call 'linux' pagetable handling3)again inspect pagetable4)map page found in table to appflex page ipc
TU Dresden, 2008-12-02 MOS - Virtualization Slide 33 von 46
General exception handling
● If a L4 task raises an exception kernel sends exception IPC to handler (feature in L4.Fiasco and L4.X2)
● Exception IPC contains CPU state of the client● Exception handler can reply with a new state,
for instance another instruction pointer● Exception IPC can be used to recognize Linux
system calls:● INT 0x80 will trigger an exception, due to lack
of IDT gate in L4 kernel● L4Linux server acts as exception handler for its
user processes
TU Dresden, 2008-12-02 MOS - Virtualization Slide 34 von 46
L4Linux kernel entry
● System call costs:● 2x kernel entry/exit (exception and reply)● 2x address space switch
Fiasco microkernel
L4Linux UserProcess
INT 0x80
L4Linux Server
arch. dependent
arch. independent2
3
4
1
TU Dresden, 2008-12-02 MOS - Virtualization Slide 35 von 46
Interrupt handling
● Interrupt messages are received in separate threads
● Interrupt threads run on a higher priority than other Linux threads (Linux semantic)
● Interrupt thread wake up idle thread or force the running user process to enter the linux server
● Plain Linux disables interrupts for syncronizaion
● Use a lock instead of CLI/STI
Fiasco Kernel
L4Linux Server
Hardware
Device Driver
InterruptThreads
L4IO
MainThread
request_irq(irq_no, handler,
…)
TU Dresden, 2008-12-02 MOS - Virtualization Slide 36 von 46
Open issues
● Linux kernel needs to access address space of user processes (e.g. syscall arguments)
● walk page tables of user process
● Security problems with DMA● move device drivers out of L4Linux● I/O MMU
● L4Linux has to schedule its tasks itself● only one L4Linux process is active at a time● other processes are waiting in IPC (exception
or pagefault)
TU Dresden, 2008-12-02 MOS - Virtualization Slide 37 von 46
Hybrid applications
● Linux applications that are 'L4 aware'● Needs to be detected by Linux server
● Linux server puts them in UNINTERRUPTIBLE state in its own data structures
● Will not disturb ongoing IPC in hybrid task
● L4Linux user processes run as Aliens● Special alien flag used when creating a task● Aliens trap when calling L4 system● Exception handler monitors system call● Fiasco-only feature
TU Dresden, 2008-12-02 MOS - Virtualization Slide 38 von 46
Real-time video player
● L4Linux user processes might use native L4 tasks or provide services themself
Fiasco kernel
Loader Roottask DMPhys Names
DOpE
RT-MPEGPlayer
L4Linux
MPlayerFrontend
controls
FS entry get file
TU Dresden, 2008-12-02 MOS - Virtualization Slide 39 von 46
Multiple L4Linux instances
● Using multiple instances concurrently, e.g. for each security domain
● Devices need to be multiplexed (see resource management lesson: ORe, nitpicker, windhoek, )
● Communication through network, special IPC monitors ...
Fiasco kernel
Loader Roottask DMPhys Names
Virtualization infrastructure
L4Linux server L4Linux server
App.App. App. App.
TU Dresden, 2008-12-02 MOS - Virtualization Slide 40 von 46
Use L4Linux as a toolbox
● L4Linux instances can provide access to various complex software stacks, e.g.:
● Network stacks● Drivers● Filesystems
Fiasco kernel
Loader Roottask DMPhys Names
L4 App
L4Linux
AlienFilesystemWrapper
TU Dresden, 2008-12-02 MOS - Virtualization Slide 41 von 46
Freezing L4Linux
● Problem with several L4Linux instances:● Wasting memory● Long boot process
● Solution:● Freeze one L4Linux instance● Make copies when necessary● Use Dataspace manager interface to minimize
necessary changes
TU Dresden, 2008-12-02 MOS - Virtualization Slide 42 von 46
Freezing L4Linux
● Pages are mapped read-only to new instance● Use copy on write (CoW)
Fiasco kernel
Loader Roottask DMPhys Names
Freezer
L4inux
use Freezer as thedefault dataspacemanager
Code
guest physicalmemory freeze()
L4inux
Code
guest physicalmemory
reinstantiate()
TU Dresden, 2008-12-02 MOS - Virtualization Slide 43 von 46
Summary
● Virtualization flavors● API or ABI● Full or partial virtualization● Hardware (especially x86) or OS
● NOVA● Minimize hypervisor by taking out
'virtualization policy'
● L4Linux – paravirtualization in detail● Address space layout & management● Taming Linux (interrupts, I/O memory)● Freeze it
TU Dresden, 2008-12-02 MOS - Virtualization Slide 44 von 46
References
● Tom Van Vleck: 'The IBM 360/67 and CP/CMS' http://www.multicians.org/thvv/360-67.html
● Keith Adams and Ole Agesen: 'A Comparision of Software and Hardware Techniques for x86 Virtualization' ASPLOS 2006 http://www.vmware.com/pdf/asplos235_adams.pdf
● Intel Virtualization Technology http://www.intel.com/technology/itj/2006/v10i3/1-hardware/1-abstract.htm
● H. Härtig, M. Roitzsch, A. Lackorzynski, B. Döbel and A. Böttcher: 'L4 – Virtualization and Beyond'
TU Dresden, 2008-12-02 MOS - Virtualization Slide 45 von 46
References
● Udo Steinberg: 'NOVA Hypervisor Architecture Whitepaper' Internal Report 2007
● L4Linux Webpage http://os.inf.tu-dresden.de/L4/LinuxOnL4
● Adam Lackorzynski: 'L4Linux Porting Optimizations' Diploma Thesis 2004 http://os.inf.tu-dresden.de/papers_ps/adam-diplom.pdf
TU Dresden, 2008-12-02 MOS - Virtualization Slide 46 von 46
Outlook
● Now, paper reading:● Formal requirements for Virtualizable Third
Generation Architectures
● In 2 weeks, virtualization part II:● Legacy containers through emulation● Libc and Qt on top of L4
top related