MIRAGEOS 2.0 : BRANCH CONSISTENCY FOR XEN STUB DOMAINS Dave Scott Citrix Systems Thomas Gazagnaire University of Cambridge Anil Madhavapeddy University of Cambridge @mugofsoup @eriangazag @avsm http://openmirage.org http://decks.openmirage.org/xendevsummit14/ Press <esc> to view the slide index, and the <arrow> keys to navigate.
26
Embed
XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
THIS XEN DEV SUMMIT TALKWe focus on how we have been using Mirage to:
improve the core Xenstore toolstack using Irmin.a performance and distribution future for Xenstore.plans for upstreaming our patches.
But first, some background...
IRMIN: MIRAGE 2.0 STORAGEIrmin is our library database that follows the modular designprinciples of MirageOS: https://github.com/mirage/irmin
Runs in both userspace and kernelspaceA key = value store (sound familiar?)Git-style: commit, branch, mergePreserves history by defaultBackend support for in-memory, Git and HTTP/REST stores.
Mirage unikernels thus version control all their data, and have adistributed provenance graph of all activities.
Append-only and easily distributed.Provides stable serialisation of structured values.Backend independent storage
memory or on-disk persistenceencryption or plaintext
Position and architecture independent pointerssuch as via SHA1 checksum of blocks.
BASE CONCEPTSHISTORY DAG (OR THE "GIT STORE")
Append-only and easily distributed.Can be stored in the Object DAG store.Keeps track of history.
Ordered audit log of all operations.Useful for merge (3-way merge is easier than 2-way)
Snapshots and reverting operations for free.
BASE CONCEPTS
IRMIN TOOLINGopam update && opam install irmin
Command-line frontend that uses:storage: in-memory format or Gitnetwork: custom format, Git or HTTP/RESTinterface: JSON interface for storing content easily
OCaml library that supplies:merge-friendly data structuresbackend implementations (Git, HTTP/REST)
XENSTORE: VM METADATAXenstore is our configuration database that stores VM metadata indirectories (ala Plan 9).
Runs in either userspace or kernelspace (just like Mirage)A key = value store (just like Irmin)Logs history by default (just like Irmin...)
XENSTORE: VM METADATAXenstore is our configuration database that stores VM metadata indirectories (ala Plan 9).
Runs in either userspace or kernelspace (just like Mirage)A key = value store (just like Irmin)Logs history by default (just like Irmin...)TRANSACTION_START branch; TRANSACTION_END merge
The "original plan" in 2002 was for seamless distribution acrosshosts/clusters/clouds. What happened? Unfortunately theprevious transaction implementations all suck.
XENSTORE: CONFLICTSTerrible performance impact: a transaction involves 100 RPCsto set it up (one per r/w op), only to be aborted and retried.Longer lived transactions have a greater chance of conflict vs ashorter transaction, repeating the longer transaction.Concurrent transactions can lead to live-lock:
Try starting lots of VMs in parallel!Much time wasted removing transactions (from xend )
XENSTORE: CONFLICTSConflicts between Xenstore transactions are sodevastating, we try hard to avoid transactionsaltogether. However they aren't going away.
XENSTORE: CONFLICTSObserve: typical Xenstore transactions (eg creating domains)shouldn't conflict. It's a flawed merging algorithm.If we were managing domain configurations in git , wewould simply merge or rebase and it would work.Therefore the Irmin Xenstore simply does:
DB.View.merge_path ~origin db [] transaction >>= function | ̀Ok () -> return true | ̀Conflict msg -> (* if merge doesn't work, try rebase *) DB.View.rebase_path ~origin db [] transaction >>= function | ̀Ok () -> return true | ̀Conflict msg -> (* A true conflict: tell the client *) ...
XENSTORE: PERFORMANCE
XENSTORE: TRANSACTIONSBig transactions give you high-level intent
useful for debug and tracingminimise merge commits (1 per transaction)minimise backend I/O (1 op per commit)crash during transaction can tell the client to "abort retry"
Solving the performance problems with bigtransactions in previous implementations greatly
improves the overall health of Xenstore.
XENSTORE: RELIABILITYWhat happens if Xenstore crashes?
Rings full of partially read/written packets. No reconnectionprotocol in common use.
proposal on xen-devel but years before we can rely on itPer-connection state in Xenstore:
watch registrations, pending watch eventsIf Xenstore is restarted, many of the rings will be broken... you'll probably have to reboot the host
XENSTORE: RELIABILITYIrmin to the rescue!
Data structure libraries built on top of Irmin, for examplemergeable queues. Use these for (eg) pending watch events.We can persist partially read/written packets so fragments canbe recovered over restartWe can persist connection information (i.e. ring informationfrom an Introduce) and auto-reconnect on startAdded bonus: easy to introspect state via xenstore-ls , cansee each registered watch, queue etc
XENSTORE: TRACINGWhen a bug is reported normal procedure is:
stare at Xenstore logs for a very long timeslowly deduce the state at the time the bug manifested(swearing and cursing is strictly optional)
With Irmin+Xenstore, one can simply:
git checkout to the revisionInspect the state with lsIn the future: git bisect automation!
XENSTORE: DATA STORAGEXenstore contains VM metadata ( /vm ) and domain metadata( /local/domain )But VM metadata is duplicated elsewhere and copied in/out
xl config files, and xapi database(insert cloud toolstack here)
With current daemons, it is unwise to persist large data.
What if Xenstore could store and distribute thisdata efficiently, and if application data could be
persisted reliably?
XENSTORE: THE DATAIrmin to the rescue!
Check in VM metadata to Irminclone , pull and push to move between hosts
expose to host via FUSE, for Plan9 filesystem goodnessmaybe one day even echo start > VM/uuid/ctlFUSE code at
VM data could be checked in to Irminvery important for unikernels that have no native storage
XENSTORE: UPSTREAMINGAdvanced prototype exists using Mirage libraries, but doesn't fullypass unit test suite. Before upstreaming:
Write fixed-size backend for block devicePreserving history is a good default, but history does need tobe squashed from time to time.
Upstream patches:switch to using using opam to build Xenstorereproducible builds via a custom Xen remoteallows using modern OCaml libraries (Lwt, Mirage, etc...)
In Xapi, delete existing db and replace with Xenstore 2.0
XENSTORE: CODEPrototype+unit tests at:
(can build without Xen on MacOS X now)https://github.com/mirage/ocaml-xenstore-server