Page 1
© 2016 Mesosphere, Inc. All Rights Reserved. 1
Process Migration in the Orchestration WorldContainerCon 2016 - Jimenez, Arya
Isabel JimenezDistributed Systems Engineer
DC/OS Security Team & Apache Mesos Contributor
[email protected] @ijimene
Kapil AryaDistributed Systems Engineer
Apache Mesos Committer & DMTCP Developer
[email protected] @karya0
Page 2
© 2016 Mesosphere, Inc. All Rights Reserved. 2
Overview
➢ Motivation
➢ Process Migration
➢ Apache Mesos
➢ Process/Container Migration for Mesos
➢ Demo
Overview
Page 3
© 2016 Mesosphere, Inc. All Rights Reserved. 3
Motivation
Page 4
© 2016 Mesosphere, Inc. All Rights Reserved. 4
● Stateless applications:○ No local state○ Start from a (relatively) vanilla state○ Perform transaction(s)○ Kill when no longer needed
● Stateful application:○ Some local state○ Start from vanilla state and compute “work” state○ Non-graceful shutdown results in loss of compute time
Stateless vs. Stateful Applications
Page 5
© 2016 Mesosphere, Inc. All Rights Reserved. 5
● Stateless applications:○ Scale up: “on-demand” deployment by launching clones as needed○ Scale down: kill unused instances without loss of computation time○ Making room for high-priority task without significant penalty
● Stateful application:○ Scale up: longer initialization times for new instances○ Scale down: wait for instances to reach a “safe” state to preserve compute cycles.○ Making room for high-priority tasks results in significant compute-time penalty
Similarly for moving applications from one node/cluster to another!
Scheduling Stateless vs. Stateful Applications
Page 6
© 2016 Mesosphere, Inc. All Rights Reserved. 6
Modern container orchestration tools are optimized for stateless applications!
Scheduling Stateless vs. Stateful Applications
Page 7
© 2016 Mesosphere, Inc. All Rights Reserved. 7
Make them stateless!
● How?○ Rewrite ‘em!
● Alternatively○ Use process/container checkpointing and migration!
How to Better Schedule Stateful Applications?
Page 8
© 2016 Mesosphere, Inc. All Rights Reserved. 8
Process Migration
Page 9
© 2016 Mesosphere, Inc. All Rights Reserved. 9
● Process Migration○ Move a running process from one node to another
● Container Migration○ Move a running container from one node to another
● Virtual machine migration (e.g., vMotion)○ Move a running virtual machine from one node to another
Terminology
Page 10
© 2016 Mesosphere, Inc. All Rights Reserved. 10
1. Pause the running process/container/VM2. Take a snapshot of the current state a.k.a. checkpointing3. Move the snapshot to the target node4. Restart from the snapshot on the target node
Do this transparently to the outside world!
● Ensure minimal downtime○ Reduce time required for stages (2) and (3)○ Ideally on the order of milliseconds!
How to Migrate a Process/Container/VM?
Page 11
© 2016 Mesosphere, Inc. All Rights Reserved.
Checkpoint-Restart is the ability to save a set of running processes to a checkpoint-image on disk, and to later restart it from disk.
● A quick demo!
What is Checkpointing?
Page 12
© 2016 Mesosphere, Inc. All Rights Reserved.
● Fault tolerance● Scheduling and process migration● Debugging (an executable bug report)● Faster startup times (checkpoint after initialization)● Save/restore workspace (for interactive sessions)● Speculative execution (what-if scenarios)● Managing long tails (single thread continues to run after other threads have
exited)
Checkpointing Use Cases
Page 13
© 2016 Mesosphere, Inc. All Rights Reserved. 13
Stateful Application + Checkpointing ≈ Stateless Application
● Scale up: start from pre-initialized snapshot● Scale down: checkpoint and kill● Migrate: checkpoint, kill, and restart
Stateful Applications with Checkpointing
Page 14
© 2016 Mesosphere, Inc. All Rights Reserved.
Checkpoint-restart involves saving and restoring:
● all of user-space memory● state of all threads● kernel state● network state● …
All this while ensure the state doesn’t change while taking a checkpoint!
● Quiesce the process(es) before saving the state!
How to Checkpoint/Restart a Process?
Page 15
© 2016 Mesosphere, Inc. All Rights Reserved. 15
● Application-level○ Embed checkpointing code inside the application itself○ Optimal○ Burden on the application developer
● Virtual machine level○ Complete state○ Higher cost
● System-level○ No modification to application source/binary○ Can be done at the kernel-level or in the user-space
Different types of Checkpointing
Page 16
© 2016 Mesosphere, Inc. All Rights Reserved.
● CRIU (Checkpoint Restart In Userspace)○ Single-node checkpointing○ Recent kernels (3.9+)○ Container-level○ http://criu.org/
● DMTCP (Distributed MultiThreaded CheckPointing)○ User-space libraries with LD_PRELOAD○ Distributed processes across multiple nodes○ http://dmtcp.sourceforge.net
16
Modern Checkpointing Systems
Page 17
© 2016 Mesosphere, Inc. All Rights Reserved. 17
Apache Mesos:The datacenter kernel
Page 18
© 2016 Mesosphere, Inc. All Rights Reserved. 18
Why can’t we run applications on our datacenters just like we run applications on our mobile phones?
We’re all building distributed systems.
Why?
Page 19
© 2016 Mesosphere, Inc. All Rights Reserved. 19
The datacenter abstraction
Page 20
© 2016 Mesosphere, Inc. All Rights Reserved. 20
Operating system
“a collection of software that manages the computer hardware resources and provides common services for computer programs”
- Wikipedia
The datacenter computer needs an operating system
Page 21
© 2016 Mesosphere, Inc. All Rights Reserved. 21
Mesos can’t run applications on its own
A Mesos framework is a distributed system
that has a scheduler.
Schedulers like Marathon keeps your application running. A bit like a distributed “init.d”.
Resource offersOffer based model
Page 22
© 2016 Mesosphere, Inc. All Rights Reserved. 22
High utilizationApache Mesos
time
Page 23
© 2016 Mesosphere, Inc. All Rights Reserved. 23
Mesos mechanics
master
agent
scheduler
RESOURCES(cpu, mem, disk, etc)
Page 24
© 2016 Mesosphere, Inc. All Rights Reserved. 24
Mesos mechanics
master
agent
scheduler
OFFER(cpu, mem, disk, etc)
Page 25
© 2016 Mesosphere, Inc. All Rights Reserved. 25
Mesos mechanics
master
agent
scheduler{ "container": { "docker": { "image": "busybox", }, "type": "DOCKER" }, "cpus": 0.1, "id": "demo", "instances": 1, "mem": 128}
Page 26
© 2016 Mesosphere, Inc. All Rights Reserved. 26
Mesos mechanics
master
agent
scheduler
ACCEPT OFFER(cpu, mem, disk, etc)
Page 27
© 2016 Mesosphere, Inc. All Rights Reserved. 27
Mesos mechanics
master
agent
scheduler
LAUNCH TASK
Page 28
© 2016 Mesosphere, Inc. All Rights Reserved. 28
Mesos mechanics
master
agent
scheduler
UPDATE STATE(STAGING, RUNNING, etc)
Page 29
© 2016 Mesosphere, Inc. All Rights Reserved. 29
Mesos mechanics
master
agent
scheduler
UPDATE STATE(STAGING, FAILED, etc)
Page 30
© 2016 Mesosphere, Inc. All Rights Reserved. 30
Mesos mechanics: Custom executor
master
agent
scheduler{ "container": { "docker": { "image": "busybox", }, "type": "DOCKER" }, "cpus": 0.1, "id": "demo", "executor": demo-executor, "mem": 128}
Page 31
© 2016 Mesosphere, Inc. All Rights Reserved. 31
Mesos mechanics
Executor
Task
Agent
LAUNCH TASK
Page 32
© 2016 Mesosphere, Inc. All Rights Reserved. 32
Mesos mechanics
Executor
Task
Agent
LAUNCH TASK
Page 33
© 2016 Mesosphere, Inc. All Rights Reserved. 33
Mesos mechanics
Executor
Task
Agent
TASK STATE
Page 34
© 2016 Mesosphere, Inc. All Rights Reserved. 34
Mesos mechanics
Executor
Task
Agent
UPDATE STATE
Page 35
© 2016 Mesosphere, Inc. All Rights Reserved. 35
Mesos mechanics
Executor
Task
Agent
UPDATE STATE
Page 36
© 2016 Mesosphere, Inc. All Rights Reserved. 36
Mesos mechanics
Executor
Task
Agent
ISOLATION
Page 37
© 2016 Mesosphere, Inc. All Rights Reserved. 37
Mesos mechanics are fair
master
agent
scheduler C scheduler Dscheduler B scheduler Escheduler A
agentagent agent agent
Page 38
© 2016 Mesosphere, Inc. All Rights Reserved. 38
Mesos mechanics are HA
master 2
agent
scheduler C scheduler Dscheduler B scheduler Escheduler A
agentagent agent agent
master 3master 1
ZooKeeper
Page 39
© 2016 Mesosphere, Inc. All Rights Reserved. 39
APACHE MESOS: Putting it all together
m 2
scheduler C scheduler Dscheduler B scheduler Escheduler A
m 1
ZooKeeper
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
aa
m 3 m 4 m 5 m 6 m 7 m 8 m 9
scheduler C scheduler Dscheduler B scheduler Escheduler Ascheduler C scheduler Dscheduler B scheduler Escheduler A
scheduler C scheduler Dscheduler B scheduler Escheduler Ascheduler C scheduler Dscheduler B scheduler Escheduler A
Page 40
© 2016 Mesosphere, Inc. All Rights Reserved. 40
Mesos Container Migration
Page 41
© 2016 Mesosphere, Inc. All Rights Reserved. 41
RUNC
● OCI specification
● Well integrated with CRIU
● Lightweight universal runtime container
● Compatible with Docker
Page 42
© 2016 Mesosphere, Inc. All Rights Reserved. 42
Mesos mechanics: Custom executor
Mesos
agent
Volt Scheduler{ "container": { "docker": { "image": "busybox", }, "type": "DOCKER" }, "cpus": 0.1, "id": "demo", "executor": volt-executor, "mem": 128}
Volt Executor
Page 43
© 2016 Mesosphere, Inc. All Rights Reserved. 43
Mesos mechanics
VOLT Executor
RunC
Agent
LAUNCH TASK
Page 44
© 2016 Mesosphere, Inc. All Rights Reserved. 44
Mesos mechanics
VOLT Executor
RunC
Agent
RunC
LAUNCH TASK
Page 45
© 2016 Mesosphere, Inc. All Rights Reserved. 45
Mesos mechanics
VOLT Executor
RunC
Agent
RunCRunC
LAUNCH TASK
Page 46
© 2016 Mesosphere, Inc. All Rights Reserved. 46
Demo!
Page 47
© 2016 Mesosphere, Inc. All Rights Reserved. 47
First class integration with Mesos
○ Transparent to the scheduler and executor
○ New tasks states (CHECKPOINTED, RESTORING, etc)
○ Support multiple checkpoint-service providers (DMTCP, CRIU, etc)
Future Work: Checkpointing as a Service
Page 48
© 2016 Mesosphere, Inc. All Rights Reserved.
THANK YOU!
48