Top Banner
© 2010 IBM Corporation IBM Linux Technology Center VirtFS A virtualization aware File System pass-through Venkateswararao Jujjuri (JV) [email protected] Linux Symposium 2010 Linux Symposium 2010
25

VirtFS Ols2010

May 10, 2015

Download

Technology

Fabio Pinelli

Slides presented at OLS2010 describing VirtFS work.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: VirtFS Ols2010

© 2010 IBM Corporation

IBM Linux Technology Center

VirtFSA virtualization aware File System pass-through

Venkateswararao Jujjuri (JV)[email protected] Symposium 2010Linux Symposium 2010

Page 2: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Paravirtual Applications and System Services

Move up the virtualization intelligence into system services.

Being explored by research and academic communities but largely ignored by the mainline.

Provides hybrid environment leveraging the security, isolation, and performance.

Visibility into guest operations allow hypervisor to offer variety of use cases.

Desktops, network sharing, file systems Avoids a layer of indirection and boosts performance. Adding this to the existing device virtualization takes

the virtualization to next level.

Page 3: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Paravirtual File Systems Good target as an entry into paravirtual system

services. Virtual storage in the form of virtual disks.

Can't be shared between multiple guests. Redundant caching Unnecessary indirection between FS and block

layer. Using traditional distributed/network file systems over

virtualized network device. Configuration, management and encapsulation

overheads. Double caching. Different semantics for different File Systems.

Page 4: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Use cases of Paravirtual File Systems Replace virtual disk as the root filesystem.

Rapid cloning, Easy management, secure. Sharing between host and guests. Offer file system services to thin clients like LibraryOS. Cloud computing

Secure window of host file system on the guest. Different portions of the same file system shared

among different guests. Knowledge about the guest activity enables

hypervisor to offer services like de-dup, snapshots etc.

Better utilization of system resources.

Page 5: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

VirtFS Paravirtual file system pass-through between the KVM

host and guest. Uses 9P Protocol between Client and Server.

9P2000.L protocol is being developed/defined as part of this effort.

Server is part of QEMU and uses VirtIO transport. File System is exported to the guest at the invocation of

QEMU. Client is part of the Guest Kernel.

Mounted on the guest with the mount tag defined during the QEMU invocation.

Page 6: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Plan 9 Overview Plan 9 OS is developed by AT&T Bell laboratories

(Alcatel Lucent). Intention was to address Unix shortcomings Seamless distributed system with integrated secure

network resource sharing. Three core design principles

Single set of simple, well-defined interfaces to services. Simple protocol to securely distribute the interfaces

across any network Dynamic hierarchical structure to organize these

interfaces. Unix pioneered the concept of treating devices like files,

Plan 9 took the metaphor further by using file operations as the simple well-defined interfaces to all system and application services.

Page 7: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

9P Overview 9P represents the protocol/abstract interface used to

access resources under Plan 9. Any transport can be used. The only requirement is it

should be a reliable, in-order transport. Made into Linux kernel 2.6.14 and had major changes

in 2.6.24. Part of Linux mainline with VirtIO transport support. 9P2000.u extension

For POSIX adoption, during Linux port the protocol was extended with 9P2000.u version.

Provided support for numeric uid/gid, extended operations to support symlinks, links, special files etc.

Did not include full support for Linux operations.

Page 8: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

9P2000.L Protocol extension Aimed at addressing 9P2000.u protocol deficiencies

while keeping the core protocol elements intact. New opcodes which match Linux VFS API in

complimentary name space. Linux native data formats (stat/permissions, etc) Support of xattr, locking, quotas etc.

Co-exist with legacy and 9P2000.L with no changes to the existing operations.

Protocol version is negotiated during initial hand shake. If server doesn't support that version, client falls back.

Page 9: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

KVM and QEMU Kernel based Virtual Machine - KVM

Is a full virtualization solution for Linux on x86 h/w containing virtualization extensions ( VT-X / AMD-V)

Set of Linux kernel modules offer a special process mode to the user spaces processes (kvm.ko, kvm-intel.ko or kvm-amd.ko)

Quick EMUlator – QEMU Uses interfaces provided by KVM to offer full system

virtualization. Emulates standard PC hardware such as IDE disk, VGA

graphics, PCI devices etc. Any I/O requests a guest OS makes are intercepted and

routed to the user mode to be emulated by the QEMU process.

Page 10: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

VirtIO Transport A paravirtual IO bus based on hypervisor neutral DMA

API. Offers lockless ring queues between the guest and the

host to enable zero-copy bulk data transfer. VirtIO PCI transport allow VirtFS to be implemented in

such a way that guest driven I/O Operations can be zero-copy.

Page 11: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

VirtFS Block Diagram

HARDWARE

HOST KERNEL

File SystemVFS Interface

VirtFSServer

(v9fs server in QEMU)

HostUser Space

VirtIORing

Guest Kernel

VFS InterfaceVirtFS (v9fs)

Client

FS API

Apps on Guest

Page 12: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

VirtFS Implementation KVM, QEMU, and VirtIO presents an ideal platform for the

VirtFS server. Two types of virtual devices

virtio-9p-pci, used to transport protocol messages and data between host and the guest.

Fsdev, used to define the exported file system characteristics like fs type and security model etc.

Command line options -fsdev

local,id=myid,path=/share_path/,security_model=mapped -device virtio-9p-pci,fsdev=myid,mount_tag=v_tag1

-virtfs local,path=/share_path/,security_model=passthrough,mnt_tag=v_tag2

On Client mount -t 9p -o trans=virtio -o version=9p2000.L v_tag1 /mnt

Page 13: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Security Two models of security enforcement One with complete isolation of guest user domain from

that of the host. Eliminates the need for root squashing No setuid/setgid exposures. Complete isolation enhances security. Not very portable.

Other model shares user domains between host and the guests.

Follows transitional network file system model. If not careful, it is susceptible to security holes.

Client based security enforcement. Server makes sure that the client control never crosses

the exported portion.

Page 14: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Security Model - Mapped VirtFS server intercepts and maps all file object create

and get/set attribute requests from client. Files are created with VirtFS server's user credentials. Client user credentials are stored in extended

attributes. Extended user attributes are allowed for regular files

and directories only. For special files, corresponding regular files are created

on file server and appropriate mode bits are added to extended attributes.

This enhances security. Guest user domain is completely isolated from host's. Symlinks can't be followed locally on the file server.

File system will be VirtFS'ized.

Page 15: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Security Model – Mapped (Cont...) On Host (ls -l output)drwx------. 2 virfsuid virtfsgid 4096 2010-05-11 09:19 adir-rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:36 afifo-rw-------. 2 virfsuid virtfsgid 0 2010-05-11 09:19 afile-rw-------. 2 virfsuid virtfsgid 0 2010-05-11 09:19 alink-rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:57 asocket1-rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:32 blkdev-rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:33 chardev-rw-------. 1 root root 6 2010-05-11 09:20 asymlink

On Guest (ls -l output)drwxr-xr-x 2 guestuser guestuser 4096 2010-05-11 12:19 adirprw-r--r-- 1 guestuser guestuser 0 2010-05-11 12:36 afifo-rw-r--r-- 2 guestuser guestuser 0 2010-05-11 12:19 afile-rw-r--r-- 2 guestuser guestuser 0 2010-05-11 12:19 alinksrwxr-xr-x 1 guestuser guestuser 0 2010-05-11 12:57 asocket1brw-r--r-- 1 guestuser guestuser 0, 0 2010-05-11 12:32 blkdevcrw-r--r-- 1 guestuser guestuser 4, 5 2010-05-11 12:33 chardevlrwxrwxrwx 1 root root 6 2010-05-11 12:20 asymlink -> afile

Page 16: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Security Model – Passthrough All the requests are passed directly to underlying file

system without any interception. File system objects on the fileserver will be created with

client-user's credentials. Two methods to do this:

setuid/setgid during the creation. chmod/chown immediately after creation.

All special files are created as-is. Portable between NFS/CIFS. Susceptible to security issues.

Client root can create files on the fileserver with root privileges if fileserver is running as root.

Symlinks can be followed locally.

Page 17: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Security Model – Passthrough (Cont...) On Host # grep 611 /etc/passwdhostuser:x:611:611::/home/hostuser:/bin/bash# ls -l-rwxrwxrwx. 2 hostuser hostuser 0 2010-05-12 18:14 file1-rwxrwxrwx. 2 hostuser hostuser 0 2010-05-12 18:14 link1srwxrwxr-x. 1 hostuser hostuser 0 2010-05-12 18:27 mysocklrwxrwxrwx. 1 hostuser hostuser 5 2010-05-12 18:25 symlink1 -> file1

On Guest $ grep 611 /etc/passwdguestuser:x:611:611::/home/guestuser:/bin/bash$ ls -l-rwxrwxrwx 2 guestuser guestuser 0 2010-05-12 21:14 file1-rwxrwxrwx 2 guestuser guestuser 0 2010-05-12 21:14 link1srwxrwxr-x 1 guestuser guestuser 0 2010-05-12 21:27 mysocklrwxrwxrwx 1 guestuser guestuser 5 2010-05-12 21:25 symlink1 ->file1

Page 18: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

ACL Implementation Access Control Lists (ACLs) allow fine grained control. No universal standards. Linux offers POSIX ACLs, but they are not versatile/rich

enough to support NFSv4. Rich ACL patch set for Linux is on the mailing list. Strategy for VirtFS

Enforcement at client. Support only one ACL model. Start with POSIX ACLs Help Rich ACLs to make into the mainline. Convert to Rich ACLs once they are available on

mainline.

Page 19: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Where are we? VirtFS server is in QEMU mainline. Security model patchset had been accepted into QEMU

mainline, part of QEMU 0.13 Several patches made into mainline Linux. Fedora13 and Lucid mounts VirtFS (9P2000.U). Making good progress on 9P2000.L. Implemented all

the required VFS calls to satisfy Tuxera POSIX test suite. These patches are either on the list or already got accepted.

A patchset to generalize worker thread infrastructure in QEMU is on mainline. Working on to convert the current single thread server into multi-thread using that infrastructure.

Working on POSIX ACLs, byte range lock implementation.

Page 20: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Performance Tests Plain & Simple; dd to compare. Write

dd if=/dev/zero of=/mnt/fileX bs=<blocksize> count=<count>

Read dd if=/mnt/fileX of=/dev/null bs=<blocksize>

count=<count> blocksize - 8k, 64k, 2M. Count = Number of blocks to do 8GB worth of IO All tests are conducted on Guest.

Page 21: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Comparison with NFS and CIFS

Sequential Read Sequential Write

Page 22: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Comparison with blockdev

Sequential WriteSequential Read

Page 23: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Next Steps Fully Linux complaint, complete 9P2000.L protocol. ACL implementation. Page Cache sharing between host and guest(s) dcache sharing between host and guests(s) Interfacing with other filesystem APIs. Enable consistent caching. NFS and CIFS exportability. Making it a rootfs for guests instead of using root

volumes Ongoing stability and scalability and performance

improvements.

Page 24: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Conclusions Huge Potential for specialized filesystems in the

virtualization space. Growth in the cloud space will be a major catalyst. Lot of scope for innovation. A step towards paravirtual system services. Related Work

XenFS Shared Folders(VMHGFS) Lguest 9P support (virtio gateway to spfs) Plan 9 Kernel KVM and Lguest Support (Sandia) Foundation: Venti for storage content-addressable back-

end for vmware (MIT)

Page 25: VirtFS Ols2010

IBM Linux Technology Center

© 2010 IBM Corporation

Questions?