Top Banner
MIGRATORY WORKLOADS ACROSS CLOUDS WITH NOMAD Phil Watts DevOps Artificer
20

Migratory Workloads Across Clouds with Nomad

Jan 21, 2017

Download

Technology

Philip Watts
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Migratory Workloads Across Clouds with Nomad

MIGRATORY WORKLOADS ACROSS CLOUDS WITH NOMAD

Phil Watts DevOps Artificer

Page 2: Migratory Workloads Across Clouds with Nomad

PROBLEM STATEMENT

“FLEXING BETWEEN THE CLOUDS”

▸ Goals of Virtualization seem universally applicable

▸ !(Vendor Lock-in)

▸ Not all workloads are valued equally

=>=>

IT Magic Anywhere

Page 3: Migratory Workloads Across Clouds with Nomad

SUCCESS CRITERIA

WIN CONDITIONS

‣ Availability of compute resources are independent of the cloud provider

‣ Batch jobs can be allocated based on point in time cost metrics

‣ Work segregation based on compliance qualifications

Page 4: Migratory Workloads Across Clouds with Nomad

TOOLCHAIN

MY CURRENT “FAVORITE” TOYSResources

Image Creation

Infrastructure Provisioning

Service Discovery

Scheduler

Driver

Page 5: Migratory Workloads Across Clouds with Nomad

DEFINITIONS: RESOURCE CONTEXT

THE BANE OF TECHNICAL UNDERSTANDING (AKA WORDS):

▸ Region: The isolation boundary of a Nomad Cluster

▸ Datacenter: Low latency, high bandwidth, private network

▸ Resources: The available capacity provided by a node

Region Datacenter

AWS Continental AWS_Region

GCE Continental GCE_Region

Azure Location Location

Region Datacenter

AWS Global AWS_Region

GCE Global GCE_Region

Azure Global Sets of Locations

Common / Comfortable Pattern Ideal Pattern

Page 6: Migratory Workloads Across Clouds with Nomad

NOMAD ARCHITECTURE - SINGLE REGION VIEW

BDFL FOR WORKLOAD DECISIONS

‣ In Nomad, Datacenter can speak to Region Aware Servers

‣ Datacenters don’t need to be the same platform

‣ Default Region is “global”

Page 7: Migratory Workloads Across Clouds with Nomad

ARCHITECTURE OF SOLUTION

▸ Nomad Clients potentially provide Resources for Jobs

▸ Communication between Datacenters may need secured

▸ Nodes run a Consul Agent and Nomad Client

▸ Nomad Servers “Bin Pack” task onto nodes

THREE PICTURES OF THE SAME THINGSingle Region / Multi DataCenter

(different Clouds)

Page 8: Migratory Workloads Across Clouds with Nomad

DEFINITIONS: TASK CONTEXT

WORDS: THE SEQUEL▸ Task: Desired state declaration of workload

▸ Constraints: Rules limiting where a job can run

▸ Evaluations: Queued request to compare desired and present state of work over the region

▸ Caused by a state change event

▸ Job Completion

▸ Node Addiction/Subtraction

▸ Job Scheduled

▸ Allocations: Mapping of tasks to resources within constraints

Page 9: Migratory Workloads Across Clouds with Nomad

JOB TYPES: SERVICE

KEEPING THE SITE UP

▸ Long running jobs that should always be available

▸ Scheduling decisions favor QoS

▸ Example: Ensuring a front end web service is always available

Page 10: Migratory Workloads Across Clouds with Nomad

JOB TYPES: BATCH

WHAT TO DO WITH ALL THIS DATA?

▸ A set of work spanning a few minutes to a few days

▸ Based on the Berkley Sparrow Two Choices model

▸ http://people.eecs.berkeley.edu/~keo/publications/sosp13-final17.pdf

▸ Probes a set of nodes which meet constraints and sends work to the "least loaded" nodes

▸ Example: Tasks to manipulate a queue of data when present

Page 11: Migratory Workloads Across Clouds with Nomad

JOB TYPES: SYSTEM

KEEPING THE LIGHTS ON

▸ A unique job type used to declare jobs which should run on every node which meets the job constraints

▸ Are re-evaluated whenever a node joins the cluster

▸ Example: distributing common tasks, which can benefit from rolling updates, job updates, service discovery

Page 12: Migratory Workloads Across Clouds with Nomad

NOMAD SCHEDULING INTERNALS

GETTING FROM WORK AND RESOURCES TO ACCOMPLISHMENTS

▸ Evaluations read the Job Specification and find constraints

▸ Evaluation Brokers maintain the pending queue, priority, and at least once delivery

▸ Schedulers submit an Allocation Plan, evaluated for feasibility, followed by priority

▸ Allocations set jobs against resources

Page 13: Migratory Workloads Across Clouds with Nomad

LIKE TETRIS FOR WORKLOADS

▸ Tasks require resources

▸ Nodes have “dimensions” of resources

▸ Allocation fits Tasks inside Nodes

BIN PACKING

Page 14: Migratory Workloads Across Clouds with Nomad

TASK GROUPS

PREVENTING TASK SEPARATION ANXIETY

▸ Task Groups allow for multiple Jobs to require they are scheduled on the same node

▸ Are created implicitly for single tasks in isolation

▸ Can be used to enforce compliance elements required to run together

▸ Example: Requiring log shipping co-processes

Page 15: Migratory Workloads Across Clouds with Nomad

CONSTRAINTS

JUST BECAUSE YOU CAN, DOESN’T MEAN YOU SHOULD▸ Job Constraints limit the resources available for a particular

job group

▸ Constraints can map workloads directly to Customized Hardware such as AWS Placement Groups

Page 16: Migratory Workloads Across Clouds with Nomad

CONSTRAINTS AND COMPLIANCE

SATISFYING COMPLIANCE REQUIREMENTS

▸ Constraints on datacenter can be used for Data Isolation inside National Boundaries.

▸ Healthcare workload that must stay within the EU

▸ Metadata attributes can allow for custom declarations.

▸ Eg. PCI DSS Compliance:

▸ Maintain network firewall

▸ Protect run Anti-Malware/Anti-Virus

▸ Monitor and log access

▸ Regularly test security systems and procedures.

1 job "sample_service" { 2 ... 3 meta { 4 pci_dss = true 5 } 6 group "webservice" { 7 constraint { 8 attribute = "meta.pci_dss" 9 value = true 10 } 11 } 12 }

Constraint Snippet

Page 17: Migratory Workloads Across Clouds with Nomad

CONSTRAINTS: SATISFYING SPECIAL NEEDS

DIFFERENT THINGS ARE DIFFERENT

▸ Not all platforms are created equal

▸ Platform attributes for specifying Cloud Platforms

1 job "sample_service" { 2 ... 3 constraint { 4 attribute = attr.platform 5 value = aws 6 } 7 }

▸ ${attr.platform} = aws May be relevant if you needFloat (GPU) processing, which AWS offers and GCE doesn’t

Page 18: Migratory Workloads Across Clouds with Nomad

RAW EXECS

CHEKHOV’S TASK DRIVER

▸ Unconstrained, Un-isolated, Disabled by Default

“IT SEEMS TO BE A DEEP INSTINCT IN HUMAN BEINGS FOR MAKING EVERYTHING COMPULSORY THAT ISN'T FORBIDDEN”

▸ Runs as the user Nomad is running as

▸ Disabled by default

client { options = { driver.raw_exec.enable = 1 } }

~Robert A. Heinlein

Page 19: Migratory Workloads Across Clouds with Nomad

OPERATOR INTERACTION

RELIABLE MAGIC = OPERATIONS

1 $ nomad run jobfile.nomad -address=$nomad_server

‣ Operators schedule jobs against a server

‣ Nomad figures out how/where/when to run tasks

‣ Complex solution through iteration

Page 20: Migratory Workloads Across Clouds with Nomad

Phil Watts DevOps Artificer @ REĀN Cloud

@pwattstbd github.com/marsupermammal

[email protected] www.reancloud.com

import "os"

func presentation() { os.Exit(0) }