Top Banner
Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) [email protected] December 16, 2016 Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 1 / 30
61

Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) [email protected]

Dec 16, 2018

Download

Documents

doanquynh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen is not just paravirtualization

Dongli Zhang

Oracle Asia Research and Development Centers (Beijing)

[email protected]

December 16, 2016

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 1 / 30

Page 2: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Plan

Virtualization

Xen Virtualization

When discussing virtualizatin …1) CPU Virtualization?

2) Memory Virtualization?

3) Device Virtualization?

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 2 / 30

Page 3: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Plan

Virtualization

Xen Virtualization

When discussing virtualizatin …1) CPU Virtualization?

2) Memory Virtualization?

3) Device Virtualization?

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 2 / 30

Page 4: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

What is virtualization

A virtual machine is taken to be an efficient, isolated duplicate of the real machine (byFormal Requirements for Virtualizable Third Generation Architectures, Gerald J.Popekand Rebert P. Goldberg, 1974)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 3 / 30

Page 5: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

What is virtualization

A virtual machine is taken to be an efficient, isolated duplicate of the real machine (byFormal Requirements for Virtualizable Third Generation Architectures, Gerald J.Popekand Rebert P. Goldberg, 1974)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 3 / 30

Page 6: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Trap and Emulate

Virtual Machine (Guest) at Unprivileged ModeVirtual Machine Monitor (Host or Hypervisor) at Priviledged Mode

Un

pri

vile

ged

Pri

vile

ged

Guest OS + Applications

Virtual Machine Monitor

MMU Emulation CPU Emulation IRQ Emulation

PrivilegedInstruction vIRQ

Page Fault

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 4 / 30

Page 7: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

x86 is NOT virtualizable

Virtualizable Architecture: all sensitive instructions must also be privilegedinstructions (by Gerald J.Popek and Rebert P. Goldberg)

critical instructions = sensitive instructions − privileged instructions

18 critical instructions on x86 (Analysis of the Intel Pentium’s Ability to Support a SecureVirtual Machine Monitor. USENIX Security 2000):

SGDT/SIDT/SLDT, SMSW, PUSHF/POPFLAR/LSL, VERR/VERW, POP/PUSHCALL, JMP, INT n, RETSTR, MOV

Solutions:

Binary Translation (QEMU, VMWare)Paravirtualization (Xen)Hardware-Assisted Virtualization (Xen, KVM, VMWare based on Intel-VT and AMD-V)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 5 / 30

Page 8: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

x86 is NOT virtualizable

Virtualizable Architecture: all sensitive instructions must also be privilegedinstructions (by Gerald J.Popek and Rebert P. Goldberg)

critical instructions = sensitive instructions − privileged instructions

18 critical instructions on x86 (Analysis of the Intel Pentium’s Ability to Support a SecureVirtual Machine Monitor. USENIX Security 2000):

SGDT/SIDT/SLDT, SMSW, PUSHF/POPFLAR/LSL, VERR/VERW, POP/PUSHCALL, JMP, INT n, RETSTR, MOV

Solutions:

Binary Translation (QEMU, VMWare)Paravirtualization (Xen)Hardware-Assisted Virtualization (Xen, KVM, VMWare based on Intel-VT and AMD-V)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 5 / 30

Page 9: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

x86 is NOT virtualizable

Virtualizable Architecture: all sensitive instructions must also be privilegedinstructions (by Gerald J.Popek and Rebert P. Goldberg)

critical instructions = sensitive instructions − privileged instructions

18 critical instructions on x86 (Analysis of the Intel Pentium’s Ability to Support a SecureVirtual Machine Monitor. USENIX Security 2000):

SGDT/SIDT/SLDT, SMSW, PUSHF/POPFLAR/LSL, VERR/VERW, POP/PUSHCALL, JMP, INT n, RETSTR, MOV

Solutions:

Binary Translation (QEMU, VMWare)Paravirtualization (Xen)Hardware-Assisted Virtualization (Xen, KVM, VMWare based on Intel-VT and AMD-V)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 5 / 30

Page 10: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Solution 1/3: Binary Translation

philosophy: rewrite critical instructions

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 6 / 30

Page 11: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Solution 2/3: Hardware Virtualization (Intel VT)

philosophy: instroduce new privileged mode

Ring 3

Ring 0

Root Mode (VMM)

Ring 3

Ring 0

Non-Root Mode (Guest)

Ring 3

Ring 0

Non-Root Mode (Guest)

VM Entry VM Entry

VM Exit VM Exit

VMXON VMOFF

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 7 / 30

Page 12: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

KVM (Kernel-based Virtual Machine)

CPU hardware virtualization extensions(Intel VT or AMD-V)

Loadable kernel module (kvm.ko,kvm-intel.ko/kvm-amd.ko)

QEMU as userspace emulator

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 8 / 30

Page 13: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Solution 3/3: Paravirtualization

philosophy: replace critical instructions with hypercallsA hypercall is a software trap from a domain to the hypervisor, just as a syscall is asoftware trap from an application to the kernel

x86 32: int 0x82x86 64: syscall instructionx86 Intel-VT vmcall instruction

x86 32-bit pvm

ring 3

ring 1

ring 0

user

kernel

xen

x86 64-bit pvm

ring 3

ring 3

ring 0

user

kernel

xen

x86 vt-x hvm/pvhvm

non-rootring 3

non-rootring 0

rootring 0

user

kernel

xen

Hypercall via int 0x82 Hypercall via syscall

System call via syscall

Hypercall via vmcall

Xen hypervisor will checks in which mode the syscall instruction is triggered

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 9 / 30

Page 14: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

State of the Art Virtualization

Binary Translation (QEMU, Bochs, VMWare)

Paravirtualization (Xen)

Hardware-assisted Virtualization (KVM, Xen, VMware)

OS-level Virtualization (Linux Container)

Programming Language Virtualization (Java, .NET CLR)

Library Virtualization (Wine, Cygwin)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 10 / 30

Page 15: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

State of the Art Virtualization

Binary Translation (QEMU, Bochs, VMWare)

Paravirtualization (Xen)

Hardware-assisted Virtualization (KVM, Xen, VMware)

OS-level Virtualization (Linux Container)

Programming Language Virtualization (Java, .NET CLR)

Library Virtualization (Wine, Cygwin)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 10 / 30

Page 16: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

State of the Art Virtualization

Binary Translation (QEMU, Bochs, VMWare)

Paravirtualization (Xen)

Hardware-assisted Virtualization (KVM, Xen, VMware)

OS-level Virtualization (Linux Container)

Programming Language Virtualization (Java, .NET CLR)

Library Virtualization (Wine, Cygwin)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 10 / 30

Page 17: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

State of the Art Virtualization

Binary Translation (QEMU, Bochs, VMWare)

Paravirtualization (Xen)

Hardware-assisted Virtualization (KVM, Xen, VMware)

OS-level Virtualization (Linux Container)

Programming Language Virtualization (Java, .NET CLR)

Library Virtualization (Wine, Cygwin)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 10 / 30

Page 18: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

State of the Art Virtualization

Binary Translation (QEMU, Bochs, VMWare)

Paravirtualization (Xen)

Hardware-assisted Virtualization (KVM, Xen, VMware)

OS-level Virtualization (Linux Container)

Programming Language Virtualization (Java, .NET CLR)

Library Virtualization (Wine, Cygwin)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 10 / 30

Page 19: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

State of the Art Virtualization

Binary Translation (QEMU, Bochs, VMWare)

Paravirtualization (Xen)

Hardware-assisted Virtualization (KVM, Xen, VMware)

OS-level Virtualization (Linux Container)

Programming Language Virtualization (Java, .NET CLR)

Library Virtualization (Wine, Cygwin)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 10 / 30

Page 20: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

What is Xen

Wikipedia

Xen Project is a hypervisor using a microkernel design, providing services that allow multiplecomputer operating systems to execute on the same computer hardware concurrently.

SOSP 2003: Xen and the Art of Virtualization

This paper presents Xen, an x86 virtual machine monitor which allows multiple commodityoperating systems to share conventional hardware in a safe and resource managed fashion, butwithout sacrificing either performance or functionality.

Basic Idea of Paravirtualization

Actively inform the hypervisor with the action guest is going to taken via hypercall

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 11 / 30

Page 21: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

What is Xen

Wikipedia

Xen Project is a hypervisor using a microkernel design, providing services that allow multiplecomputer operating systems to execute on the same computer hardware concurrently.

SOSP 2003: Xen and the Art of Virtualization

This paper presents Xen, an x86 virtual machine monitor which allows multiple commodityoperating systems to share conventional hardware in a safe and resource managed fashion, butwithout sacrificing either performance or functionality.

Basic Idea of Paravirtualization

Actively inform the hypervisor with the action guest is going to taken via hypercall

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 11 / 30

Page 22: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

What is Xen

Wikipedia

Xen Project is a hypervisor using a microkernel design, providing services that allow multiplecomputer operating systems to execute on the same computer hardware concurrently.

SOSP 2003: Xen and the Art of Virtualization

This paper presents Xen, an x86 virtual machine monitor which allows multiple commodityoperating systems to share conventional hardware in a safe and resource managed fashion, butwithout sacrificing either performance or functionality.

Basic Idea of Paravirtualization

Actively inform the hypervisor with the action guest is going to taken via hypercall

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 11 / 30

Page 23: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Framework 1/2

xen hypervisor (microkernel): dictator

scheduling, memory management, interrupt and device control

per-domain and per-vcpu info management

dom0 (host): privileged admin

xm/xend/xl (libxc)

pygrub/hvmloader

xenstored

qemu and paravirtual driver backend

native device driver

domU (guest): non-privileged user

paravirtual driver frontend

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 12 / 30

Page 24: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Framework 1/2

xen hypervisor (microkernel): dictator

scheduling, memory management, interrupt and device control

per-domain and per-vcpu info management

dom0 (host): privileged admin

xm/xend/xl (libxc)

pygrub/hvmloader

xenstored

qemu and paravirtual driver backend

native device driver

domU (guest): non-privileged user

paravirtual driver frontend

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 12 / 30

Page 25: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Framework 1/2

xen hypervisor (microkernel): dictator

scheduling, memory management, interrupt and device control

per-domain and per-vcpu info management

dom0 (host): privileged admin

xm/xend/xl (libxc)

pygrub/hvmloader

xenstored

qemu and paravirtual driver backend

native device driver

domU (guest): non-privileged user

paravirtual driver frontend

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 12 / 30

Page 26: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Framework 2/2

PVM HVM PVHVMDomain 0

xm

xend

xl xenstore

Backend PV

Drivers

LegacyDevice Drivers

privcmddriver

FrontendPV

Drivers

FrontendPV

Drivers

LegacyDeviceDrivers

QEMUs

Xen HypervisorMemory

ManagementCPU

VirtualizationTimer

Virtualization

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 13 / 30

Page 27: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Convert Linux to Paravirtual Dom0/DomU

ELF notes (Linux) or xen guest section (MiniOS) in kernel imageEnable xen features in .config when building kernel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 14 / 30

Page 28: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

PV, HVM or PVHVM

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 15 / 30

Page 29: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen CPU Virtualization

vcpu ≈ task structdomain ≈ container or process groupxen schedules vcpu

x86 32-bit pvm

ring 3

ring 1

ring 0

user

kernel

xen

x86 64-bit pvm

ring 3

ring 3

ring 0

user

kernel

xen

x86 vt-x hvm/pvhvm

non-rootring 3

non-rootring 0

rootring 0

user

kernel

xen

2. system call

1. set a per-domain system call handler when the domain gets scheduled

3. Trap to and handledin guest kernel

1. system call

2. Route to guest kernelsystem call handler

3. Handled in guest kernel

1. system call

2. Trap to and handledIn guest kernel directly

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 16 / 30

Page 30: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Interrupt Virtualization: Event Channel 1/2

Event Channel Types

Interdomain Event

Virtual IRQ Event

Physical IRQ Event

IPI Event

Registration

PVM registers event channel handler to Xen viaregister callback(CALLBACKTYPE event, xen hypervisor callback)

PVHVM sets HYPERVISOR CALLBACK VECTOR viaHYPERVISOR hvm op(HVMOP set param, &a)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 17 / 30

Page 31: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Interrupt Virtualization: Event Channel 2/2

Domain 0

Xen Hypervisor

PVM HVM PVHVM

Global Event Channel Info

Per-vcpu Event Channel Info

Global Event Channel Mask

Per-vcpu Event Channel Mask

Global Event Channel Mask

Per-vcpu Event Channel Mask

Global Event Channel Mask

Per-vcpu Event Channel Mask

vcpu vcpu vcpu vcpu vcpu vcpu vcpu vcpu

set eip to xen_hypervisor_callback

during scheduling if vcpu has pending event

set eip to xen_hypervisor_callback

during scheduling if vcpu has pending event

Intel-vt based interrupt injection and one vector for each irq

Intel-vt based interrupt injection and

vector 0xf3 for each event

xen_evtchn_do_upcallwill traverse and handle

each pending event

xen_evtchn_do_upcallwill traverse and handle

each pending eventGuest will handle interrupt

as native machineIRQ handler for vector 0xf3 is called

xen_evtchn_do_upcallwill traverse and handle

each pending event

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 18 / 30

Page 32: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Memory Virtualization 1/2

Address TypesGVA (Guest Virtual Address)GPA (Guest Physical Address) or GFN (Guest page Frame Number)HPA (Host Physical Address) or MFN (Machine page Frame Number)

Hardware-assisted Memory Virtualization (Method 1/3): Second-Level Page Table: Intel: Extended Page Table (EPT): AMD: Nested Page Table (NPT)

Guest Page

Tables

Host Page Tables

Guest Virtual Address Host Physical AddressGuest Physical Address

Guest CR3 Register Host EPTP Register

Non-Root Mode Root Mode

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 19 / 30

Page 33: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Memory Virtualization 2/2

Direct Paging (Method 2/3): guest manage the (GVA, HPA) page table directly

Shadow Paging (Method 3/3): xen hypervisor maintains a shadow (GVA, HPA) pagetable which is not awared by guest

MFN

MFN MFNGuest OS

Xen Hypervisor

MFNMFN

MFNMMU

PFN

PFN PFNGuest OS

Xen Hypervisor

MFNMFN

MFNMMU

PFN

PFN PFN

Direct Paging (MMU Paravirtualization) Shadow Page Table

PFN MFN

… ...

P2m Table is mapped to guest by hypervisor

Shadow Table

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 20 / 30

Page 34: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Device Virtualization

HVM emulated legacy device (QEMU)

Paravirtual (PV) drivers

Device Passthrough (vt-d)

Virtual Function (vt-d)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 21 / 30

Page 35: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Device Virtualization

HVM emulated legacy device (QEMU)

Paravirtual (PV) drivers

Device Passthrough (vt-d)

Virtual Function (vt-d)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 21 / 30

Page 36: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Device Virtualization

HVM emulated legacy device (QEMU)

Paravirtual (PV) drivers

Device Passthrough (vt-d)

Virtual Function (vt-d)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 21 / 30

Page 37: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Device Virtualization

HVM emulated legacy device (QEMU)

Paravirtual (PV) drivers

Device Passthrough (vt-d)

Virtual Function (vt-d)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 21 / 30

Page 38: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Device Virtualization

HVM emulated legacy device (QEMU)

Paravirtual (PV) drivers

Device Passthrough (vt-d)

Virtual Function (vt-d)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 21 / 30

Page 39: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

PV driver vs. PCI driver

PCI driver PV driver

device abstraction pci device, pci driver

xenbus device, xenbus driver

device discovery PCI Tree

Xenstore

device configuration PCI Config Space (IO/MMIO)

Xenstore

data flow DMA Ring Buffer

Memory Ring Buffer

shared memory N/A or IOMMU

Grant Table

interrupt IOAPIC, MSI, MSI-X

Event Channel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 22 / 30

Page 40: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

PV driver vs. PCI driver

PCI driver PV driver

device abstraction pci device, pci driver xenbus device, xenbus driverdevice discovery PCI Tree

Xenstore

device configuration PCI Config Space (IO/MMIO)

Xenstore

data flow DMA Ring Buffer

Memory Ring Buffer

shared memory N/A or IOMMU

Grant Table

interrupt IOAPIC, MSI, MSI-X

Event Channel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 22 / 30

Page 41: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

PV driver vs. PCI driver

PCI driver PV driver

device abstraction pci device, pci driver xenbus device, xenbus driverdevice discovery PCI Tree Xenstoredevice configuration PCI Config Space (IO/MMIO)

Xenstore

data flow DMA Ring Buffer

Memory Ring Buffer

shared memory N/A or IOMMU

Grant Table

interrupt IOAPIC, MSI, MSI-X

Event Channel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 22 / 30

Page 42: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

PV driver vs. PCI driver

PCI driver PV driver

device abstraction pci device, pci driver xenbus device, xenbus driverdevice discovery PCI Tree Xenstoredevice configuration PCI Config Space (IO/MMIO) Xenstoredata flow DMA Ring Buffer

Memory Ring Buffer

shared memory N/A or IOMMU

Grant Table

interrupt IOAPIC, MSI, MSI-X

Event Channel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 22 / 30

Page 43: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

PV driver vs. PCI driver

PCI driver PV driver

device abstraction pci device, pci driver xenbus device, xenbus driverdevice discovery PCI Tree Xenstoredevice configuration PCI Config Space (IO/MMIO) Xenstoredata flow DMA Ring Buffer Memory Ring Buffershared memory N/A or IOMMU

Grant Table

interrupt IOAPIC, MSI, MSI-X

Event Channel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 22 / 30

Page 44: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

PV driver vs. PCI driver

PCI driver PV driver

device abstraction pci device, pci driver xenbus device, xenbus driverdevice discovery PCI Tree Xenstoredevice configuration PCI Config Space (IO/MMIO) Xenstoredata flow DMA Ring Buffer Memory Ring Buffershared memory N/A or IOMMU Grant Tableinterrupt IOAPIC, MSI, MSI-X

Event Channel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 22 / 30

Page 45: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

PV driver vs. PCI driver

PCI driver PV driver

device abstraction pci device, pci driver xenbus device, xenbus driverdevice discovery PCI Tree Xenstoredevice configuration PCI Config Space (IO/MMIO) Xenstoredata flow DMA Ring Buffer Memory Ring Buffershared memory N/A or IOMMU Grant Tableinterrupt IOAPIC, MSI, MSI-X Event Channel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 22 / 30

Page 46: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xenstore/Xenbus

Domain 0 Domain U

xenstore

xenbus

xenbus

write VM config to xenstore:* device info* memory hotplug...

monitor changes in xenstorewith xenwatch

monitor changesin xenstorewith xenwatch

Xen Hypervisor

xm / xl

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 23 / 30

Page 47: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Grant Table

Domain 0

Domain 1

Xen Hypervisor

Grant Table for Domain 1

Grant Table for Domain 0

xen-netback xen-netfront

pfn 1024

Network Packets

2. I want to share pfn 1024 as grant table reference 19 to Domain 0. Domain 0 can map or copy from this page

5. You are allowed to access ref 19. I will map or copy the data to your memory space

1. Pick up a free grant table reference 19

3. Share ref 19 to domain 0via xenstore or other ways

4. Can I map (copy) ref 19 to my memory space?

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 24 / 30

Page 48: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

I/O Ring Buffer

Usually put grant ref (not data) in ring

Grant ref of ring pages are shared via xenstore

Usually one ring buffer for each device queue

One or more pages for each ring

Producer and Consumer (barrier)

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 25 / 30

Page 49: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Xen Paravirtual Networking Framework

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 26 / 30

Page 50: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

VM Creation Workflow

xm createvm.cfg

xl create (libxc)

xend(libxc) xen

hypervisorExtract kernel and ramdisk

from vdiskvia pygrub

for PVM

xenstore

DomUGuest

Dom0

XML-RPCvia socket

Ask xen hypervisor to create a VM, initiatevcpu, p2m, etc.

Write VM deviceinfo to xenstore

Boot PVM into protected mode

Boot HVM/PVHVMinto real modevia hvmloader

xensore

xenstore

Watching at xenstore. Initiate device driverat frontend

Watching at xenstore. Initiate device driverat backend

udev onDom0

Ask userspacehotplug script tohelp configurebackend hotplug

script

Bridging vif to bridge orobtain major/minor numberof VM disk image file

Synchronize witheach other viaxenstore and finish!

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 27 / 30

Page 51: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Selected Xen Projects

COLO - Coarse Grain Lock Stepping

LivePatch

Stealthy monitoring with Xen altp2m

Real-Time-Deferrable-Server(RTDS) CPU Scheduler

Windows PV Receive Side Scaling

More at Xen Summit and xen-devel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 28 / 30

Page 52: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Selected Xen Projects

COLO - Coarse Grain Lock Stepping

LivePatch

Stealthy monitoring with Xen altp2m

Real-Time-Deferrable-Server(RTDS) CPU Scheduler

Windows PV Receive Side Scaling

More at Xen Summit and xen-devel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 28 / 30

Page 53: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Selected Xen Projects

COLO - Coarse Grain Lock Stepping

LivePatch

Stealthy monitoring with Xen altp2m

Real-Time-Deferrable-Server(RTDS) CPU Scheduler

Windows PV Receive Side Scaling

More at Xen Summit and xen-devel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 28 / 30

Page 54: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Selected Xen Projects

COLO - Coarse Grain Lock Stepping

LivePatch

Stealthy monitoring with Xen altp2m

Real-Time-Deferrable-Server(RTDS) CPU Scheduler

Windows PV Receive Side Scaling

More at Xen Summit and xen-devel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 28 / 30

Page 55: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Selected Xen Projects

COLO - Coarse Grain Lock Stepping

LivePatch

Stealthy monitoring with Xen altp2m

Real-Time-Deferrable-Server(RTDS) CPU Scheduler

Windows PV Receive Side Scaling

More at Xen Summit and xen-devel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 28 / 30

Page 56: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Selected Xen Projects

COLO - Coarse Grain Lock Stepping

LivePatch

Stealthy monitoring with Xen altp2m

Real-Time-Deferrable-Server(RTDS) CPU Scheduler

Windows PV Receive Side Scaling

More at Xen Summit and xen-devel

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 28 / 30

Page 57: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Reference

Publications

Xen and the art of virtualization. Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand,Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. SOSP 2003

The Definitive Guide to the Xen Hypervisor. David Chisnall. 2007

Intel 64 and IA-32 Architectures Software Developer Manuals

Various system & security research paper and presentation

Miscellaneous

Xen Project Developer Summit

https://blog.xenproject.org

https://github.com/finallyjustice/JOS-vmx

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 29 / 30

Page 58: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Take-Home Message

What is virtualization

Paravirtualization and Hardware-assisted Virtualization

Xen vs. KVM

Grant Table, Event Channel, Paravirtual Drivers

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 30 / 30

Page 59: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Take-Home Message

What is virtualization

Paravirtualization and Hardware-assisted Virtualization

Xen vs. KVM

Grant Table, Event Channel, Paravirtual Drivers

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 30 / 30

Page 60: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Take-Home Message

What is virtualization

Paravirtualization and Hardware-assisted Virtualization

Xen vs. KVM

Grant Table, Event Channel, Paravirtual Drivers

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 30 / 30

Page 61: Xen is not just paravirtualization - Dongli Zhang · Xen is not just paravirtualization Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com

Take-Home Message

What is virtualization

Paravirtualization and Hardware-assisted Virtualization

Xen vs. KVM

Grant Table, Event Channel, Paravirtual Drivers

Dongli Zhang (Oracle) Xen is not just paravirtualization December 16, 2016 30 / 30