Top Banner
Containers @ Netflix How they add to a proven cloud architecture
65

Velocity NYC 2016 - Containers @ Netflix

Apr 16, 2017

Download

Technology

aspyker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Velocity NYC 2016 - Containers @ Netflix

Containers @ NetflixHow they add to a proven cloud architecture

Page 2: Velocity NYC 2016 - Containers @ Netflix
Page 3: Velocity NYC 2016 - Containers @ Netflix

https://www.flickr.com/photos/hinnosaar/2655128664

Page 4: Velocity NYC 2016 - Containers @ Netflix

Datacenters

Java monolithic app

Oracle database

Page 5: Velocity NYC 2016 - Containers @ Netflix
Page 6: Velocity NYC 2016 - Containers @ Netflix

AWS cloud

Java microservices

Cassandra

Page 7: Velocity NYC 2016 - Containers @ Netflix

Netflix Open Source Software

http://netflix.github.io

Page 8: Velocity NYC 2016 - Containers @ Netflix
Page 9: Velocity NYC 2016 - Containers @ Netflix

Containers @ NetflixHow they add to a proven cloud architecture

Page 10: Velocity NYC 2016 - Containers @ Netflix

batch applications

Page 11: Velocity NYC 2016 - Containers @ Netflix

Multi-tenant (cgroups/Mesos) historically used for batch

Linux cgroups

Page 12: Velocity NYC 2016 - Containers @ Netflix

What do batch users want?

● Simple shared resources, run till done, job files

● NOT○ EC2 Instance sizes, autoscaling, AMI OS’s

● WHY○ Offloads resource management ops, Simpler

Page 13: Velocity NYC 2016 - Containers @ Netflix

Titus

Batch

Job Management

Resource Management & Optimization

Container ExecutionIntegration

Workflow, Data Analysis, Adhoc Upstream Systems

Page 14: Velocity NYC 2016 - Containers @ Netflix

Netflix Batch Job Examples

● Algorithm Model Training (with GPU’s)

Page 15: Velocity NYC 2016 - Containers @ Netflix

Netflix Batch Job Examples● Media Encoding

● Digital Watermarking

1 1

Page 16: Velocity NYC 2016 - Containers @ Netflix

Netflix Batch Job Examples

Open Connect CDN Reporting

AdhocReporting

Page 17: Velocity NYC 2016 - Containers @ Netflix

Lessons Learned from Batch

● Docker helped generalize use cases

Page 18: Velocity NYC 2016 - Containers @ Netflix

Lessons Learned from Batch

● Docker helped generalize use cases● Advanced scheduling required

Page 19: Velocity NYC 2016 - Containers @ Netflix

Lessons Learned from Batch

● Docker helped generalize use cases● Advanced scheduling required● Initially ignored failures (with retries)

Page 20: Velocity NYC 2016 - Containers @ Netflix

● Docker helped generalize use cases● Advanced scheduling required● Initially ignored failures (with retries)● Time sensitive batch came later

Lessons Learned from Batch

Page 21: Velocity NYC 2016 - Containers @ Netflix

Current Container Usage - Batch

● 100 containers / hour● Peaks of 1000’s per hour● Large spikes of CI testing and Digital Watermarking

A random day’s worth of containers

Page 22: Velocity NYC 2016 - Containers @ Netflix

service applications

Page 23: Velocity NYC 2016 - Containers @ Netflix

Why Services in containers?

Theory Reality

Page 24: Velocity NYC 2016 - Containers @ Netflix

“Why is Apache and Tomcat running on my NodeJS server”

Page 25: Velocity NYC 2016 - Containers @ Netflix

“Why is Apache and Tomcat running on my NodeJS server”

Problem:BaseAMI optimized for Java, not easily customizable

Page 26: Velocity NYC 2016 - Containers @ Netflix

“Why do I need java, gradle, ospackage after my non-Java build?”

Page 27: Velocity NYC 2016 - Containers @ Netflix

“Why do I need java, gradle, ospackage after my non-Java build?”

Problem:Reuse of Java-centric AMI tooling

Page 28: Velocity NYC 2016 - Containers @ Netflix

“I want an instance with a single core to run my lightweight server”

Page 29: Velocity NYC 2016 - Containers @ Netflix

“I want an instance with a single core to run my lightweight server”

Problem:Small instances are not reliable

Page 30: Velocity NYC 2016 - Containers @ Netflix

Enter Docker

● Have a new language?● Have a build tool you like?● Want to resource isolate easily?

Come one, come all

Page 31: Velocity NYC 2016 - Containers @ Netflix

Services are just long running batch?

ServicesJob Management

Resource Management & Optimization

Container ExecutionIntegration

Service Apps

Batch

Page 32: Velocity NYC 2016 - Containers @ Netflix

Services more complex● Services resize constantly and run forever

○ Autoscaling○ Hard to upgrade underlying hosts

Page 33: Velocity NYC 2016 - Containers @ Netflix

Services more complex● Services resize constantly and run forever

○ Autoscaling○ Hard to upgrade underlying hosts

● Have more state○ Ready for traffic vs. just started/stopped○ Even harder to upgrade

Page 34: Velocity NYC 2016 - Containers @ Netflix

Services more complex● Services resize constantly and run forever

○ Autoscaling○ Hard to upgrade underlying hosts

● Have more state○ Ready for traffic vs. just started/stopped○ Even harder to upgrade

● Existing well defined dev, deploy, runtime & ops tools

Page 35: Velocity NYC 2016 - Containers @ Netflix

Real networking is hard

Page 36: Velocity NYC 2016 - Containers @ Netflix

Multi-tenant

Need IP per container - in VPC

Using security groups

Using IAM roles

Considering network resource isolation

Page 37: Velocity NYC 2016 - Containers @ Netflix

Solutions● VPC Networking driver

○ Supports ENI’s - full IP functionality○ Scheduled security groups○ Support traffic control (isolation)

● EC2 Metadata proxy○ Adds container “node” identity○ Delivers IAM roles

Page 38: Velocity NYC 2016 - Containers @ Netflix

Reuse existing infrastructure services

VMVM

EC2

AW

S A

utoS

cale

rVMs

App

Cloud Platform(metrics, IPC, health)

VPC

Netflix Cloud Infrastructure (VM’s + Containers)

Atlas Eureka Edda

Page 39: Velocity NYC 2016 - Containers @ Netflix

Enable them for containers

VMVM

EC2

AW

S A

utoS

cale

rVMs

App

Cloud Platform(metrics, IPC, health)

VPC

Netflix Cloud Infrastructure (VM’s + Containers)

VMVM

Atlas

Titu

s Jo

b C

ontro

l

Containers

App

Cloud Platform(metrics, IPC, health)

Eureka Edda

VMVM

BatchContainers

Page 40: Velocity NYC 2016 - Containers @ Netflix

Spinnaker

Page 41: Velocity NYC 2016 - Containers @ Netflix

Deploy based on new images

tags

Page 42: Velocity NYC 2016 - Containers @ Netflix

Basic resource requirements

IAM Roles & Sec Groups per container

Deploy Strategies

Same as VM’s

Page 43: Velocity NYC 2016 - Containers @ Netflix

Easily see health &

discovery

Page 44: Velocity NYC 2016 - Containers @ Netflix
Page 45: Velocity NYC 2016 - Containers @ Netflix
Page 46: Velocity NYC 2016 - Containers @ Netflix

Current Container Usage - Service

● Still small - 100’s of containers

● NodeJS Device UI Scripts Apps● Stream Processing Jobs - Flink● Various Internal Dashboards

Page 47: Velocity NYC 2016 - Containers @ Netflix

developer experience

Page 48: Velocity NYC 2016 - Containers @ Netflix

● Consistent Mac setup● Consistent workflows● Netflix integration

The Docker experience

Page 49: Velocity NYC 2016 - Containers @ Netflix

NEWT (Netflix Workflow Toolkit)

Page 50: Velocity NYC 2016 - Containers @ Netflix

dev machine bootstrap

Page 51: Velocity NYC 2016 - Containers @ Netflix

project scaffolding

Page 52: Velocity NYC 2016 - Containers @ Netflix

setup dev pipelines

Page 53: Velocity NYC 2016 - Containers @ Netflix

run locally

Page 54: Velocity NYC 2016 - Containers @ Netflix

build history

start pipeline

Page 55: Velocity NYC 2016 - Containers @ Netflix

beyond java

Page 56: Velocity NYC 2016 - Containers @ Netflix

beyond java

Page 57: Velocity NYC 2016 - Containers @ Netflix
Page 58: Velocity NYC 2016 - Containers @ Netflix

Node.js supportbefore Newt:● Install Java● Install Nebula (Netflix Gradle)● Add a build.gradle● Run gradlew wrapper● Add deb instr to build.gradle● Install Vagrant + VBox● Test deb locally● Create Stash repo● Create Jenkins job● Create Spinnaker pipelines● git push

after Newt:● Install Newt● newt init --app-type nodejs● git push

Nebula ospackage is hidden inside a local

Docker container managed by Newt

Page 59: Velocity NYC 2016 - Containers @ Netflix

https://en.wikipedia.org/wiki/Data_visualization#/media/File:Social_Network_Analysis_Visualization.png

dependencies @ scale

Page 60: Velocity NYC 2016 - Containers @ Netflix

Project Niagara

Page 61: Velocity NYC 2016 - Containers @ Netflix
Page 62: Velocity NYC 2016 - Containers @ Netflix

100,000 builds a day, peak

Page 63: Velocity NYC 2016 - Containers @ Netflix

where are we going?

Page 64: Velocity NYC 2016 - Containers @ Netflix

Future of containers @ Netflix

● More scale!● Guaranteed capacity (service)● Fair scheduling (batch)● Local integration test env (devex)● Next generation CI (devex)● Internal RI spot market of trough

Page 65: Velocity NYC 2016 - Containers @ Netflix

Questions?