Top Banner
HOW WE LEARNED TO LOVE THE DATA CENTER OPERATING SYSTEM SAULIUS VALATKA / ADFORM
38

How We Learned To Love The Data Center Operating System

Jan 25, 2017

Download

Engineering

saulius_vl
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How We Learned To Love The Data Center Operating System

HOW WE LEARNED TO LOVE THE DATA CENTER OPERATING SYSTEM

SAULIUS VALATKA / ADFORM

Page 2: How We Learned To Love The Data Center Operating System

Online Advertising Full Stack Platform

Page 3: How We Learned To Love The Data Center Operating System

Online Advertising Full Stack Platform

Realtime “smart“ ads

Forecasting, fraud detection, recommenders, etc.

Page 4: How We Learned To Love The Data Center Operating System

Online Advertising Full Stack Platform

Realtime “smart“ ads

Forecasting, fraud detection, recommenders, etc.

1mln QPS under 100ms

2TB daily data

Page 5: How We Learned To Love The Data Center Operating System

SO WHY CONTAINERS ?

TL; DR: data scientists do not care about infrastructure

Page 6: How We Learned To Love The Data Center Operating System

YOLO ERA

ctrtrain.ec2-aws.com test2.ec2-aws.com modelling.ec2-aws.com

Page 7: How We Learned To Love The Data Center Operating System

THE TORTURE

# yum install python R libboost-3.12

$ scp script.R test.aws.com:/script.R

# crontab -e

Page 8: How We Learned To Love The Data Center Operating System

THE TORTURE

# yum install python R libboost-3.12

$ scp script.R test.aws.com:/script.R

# crontab -e

“strange, worked on my machine …”

Page 9: How We Learned To Love The Data Center Operating System

EARLY ADOPTION

ctrtrain.ec2-aws.com test2.ec2-aws.com worker-1.adform.com worker-2.adform.com

ab34na3n ar2afga3n

Page 10: How We Learned To Love The Data Center Operating System
Page 11: How We Learned To Love The Data Center Operating System

CONTAINERIZE !

self contained artifacts

isolated runtime

basically no overhead

unified deployment

Page 12: How We Learned To Love The Data Center Operating System

BUT WAIT …

what about configuration ? deployment ?

Page 13: How We Learned To Love The Data Center Operating System
Page 14: How We Learned To Love The Data Center Operating System

12 FACTOR APP

III. Config

Store config in the environment

V. Build, release, run

Strictly separate build and run stages

X. Dev/prod parity

Keep development, staging, and production as similar as possible

Page 15: How We Learned To Love The Data Center Operating System

12 FACTOR APP

III. Config

Store config in the environment

V. Build, release, run

Strictly separate build and run stages

X. Dev/prod parity

Keep development, staging, and production as similar as possible

Page 16: How We Learned To Love The Data Center Operating System

12 FACTOR APP

III. Config

Store config in the environment

V. Build, release, run

Strictly separate build and run stages

X. Dev/prod parity

Keep development, staging, and production as similar as possible

Page 17: How We Learned To Love The Data Center Operating System

BUT WAIT …

where do I log ?

and what about metrics ?

Page 18: How We Learned To Love The Data Center Operating System
Page 19: How We Learned To Love The Data Center Operating System

MODERN ERA

4afsdgg

asdf4faf

se4faw

aw3d3ff

g4aefgsd

5gsdgr54s

Page 20: How We Learned To Love The Data Center Operating System
Page 21: How We Learned To Love The Data Center Operating System

MARATHON

the init of the DCOS

constraints

deployment

{"id": “my-nginx","container": {

"type": "DOCKER","docker": {

"image": "nginx:1.7.7","network": "BRIDGE",

}},"instances": 1,"cpus": 0.5,"mem": 128

}

Page 22: How We Learned To Love The Data Center Operating System

SPRINT

the exec of he DCOS

will be open sourced

scheduler to follow!

{“labels": {“name”: “training“,“period”: “2016-07-11”

},"container": {

"type": "DOCKER","docker": {

"image": “ctr-train:0.1"}

},"cpus": 5,"mem": 1024

}

Page 23: How We Learned To Love The Data Center Operating System

MANAGING RESOURCES

how much memory do I really need ?

Page 24: How We Learned To Love The Data Center Operating System

MANAGING RESOURCES

how much memory do I really need ?

and CPUs ?

what does 0.5 CPUs mean anyway ?

Page 25: How We Learned To Love The Data Center Operating System

MANAGING RESOURCES

how much memory do I really need ?

and CPUs ?

what does 0.5 CPUs mean anyway ?

and what happens with the network ?

Page 26: How We Learned To Love The Data Center Operating System

¯\_(ツ)_/¯

Page 27: How We Learned To Love The Data Center Operating System

ISOLATION

cpuset memory blkio net_cls

cpu limit_in_bytes read_iops_device

shares swappiness write_iops_device

cfs_period_us

cfs_quota_us

Page 28: How We Learned To Love The Data Center Operating System

NETWORK ISOLATION

Layer 3 routing software defined networks

Page 29: How We Learned To Love The Data Center Operating System

CURRENT STATE

a4faw3f

4afsdgg

asdf4faf

se4faw

aw3d3ffg4aefgsd

5gsdgr54s

a4rff4afa 4f4qaf4

Page 30: How We Learned To Love The Data Center Operating System

SERVICE DISCOVERY

where is my app ? how do I reach it ?

won’t containers conflict about ports ?

Page 31: How We Learned To Love The Data Center Operating System

MARATHON-LB

Page 32: How We Learned To Love The Data Center Operating System

PERSISTENCE

so .. where do I store my data ?

on the host ? won’t it disappear ?

Page 33: How We Learned To Love The Data Center Operating System

PERSISTENCE

/ /opt/app/cache

/var/lib/docker/devicemapper/var/lib/mesos/slave/volumes

Page 34: How We Learned To Love The Data Center Operating System

PERSISTENCE

/ /opt/app/cache /opt/app/profile

/var/lib/docker/devicemapper/var/lib/mesos/slave/volumes

/mnt/sdc

network block storage

Page 35: How We Learned To Love The Data Center Operating System

WHAT’S MISSING

Authorization / Authentication

Better debugging

Page 36: How We Learned To Love The Data Center Operating System

IN SUMMARY

Containers are the perfect level of abstraction

Page 37: How We Learned To Love The Data Center Operating System

IN SUMMARY

Containers are the perfect level of abstraction

Data science is the perfect use case for containers

Page 38: How We Learned To Love The Data Center Operating System

@adforminsider