廣告系統在Docker/Mesos上的可靠性實踐

Post on 20-Feb-2017

180 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

Transcript

廣告系統在 Docker/Mesos上的可靠性實踐Michael Apr.2014 聚效广告 (MediaV)

Who Am I ?

Where is our system?

Where is our system?

Small Impression with Huge Computing

AD Request10億 200億+

QPS100萬+1萬

Latency500ms 10ms

60 DevOps Engineers2000+ physical server

100+ module with realtime service99.95% service availability

Why Container?

Why Scheduler?

• 人為事故, debug, env changed etc…• 非人為故意, Bug, Crash, OOM, memory leak,

disk full etc…• 外部原因, ad code• On-Call 恢復• Scaling Service• 資源利用率

We are in 2016

We are in 2014

2014Q4touch lmctfy

2015Q1try docker with k8s

2015Q2docker on

mesos/yarn?

2015Q3we are runningdocker/mesos

etc.

2016Q1more

batch job & LTS online

2015Q4more service

ci/release

How to start?

MESOS可以為團隊帶來什麼 ?

典型 LTS adhoc任务轻服务

Free Free

—100%

—100%

資源使用分佈 DEMO

服務 Docker容器化遇到的典型問題SE7EN

1/7

1/7

“If you run SSHD in your Docker containers, you're doing it wrong!”

https://jpetazzo.github.io/2014/06/23/docker-ssh-considered-evil/

–Jérôme Petazzoni

2/7 where is my debug logs?

3/7 Docker Network性能差?

http://machinezone.github.io/research/networking-solutions-for-kubernetes/

4/7 如何寫本地文件?如何存儲持久化?

+

5/7 服務的註冊和發現?

We’re

OR

6/7 如何讓服務可調度性?

這是一個大問題,留給每個 Dev工程師

7/7 服務器的數據加載問題?

拋棄 迎接rsynccpscpftp

Everything API/Thrift

Marathon Framework on MESOS

Chronos Framework on MESOS

Chronos : batch job在分布式系統上的替代品

chronos cron azkaban

distributed Yes No halfWeb UI Yes No Yes

Job history Yes,Simple Manual Yes,Fulldependency Yes,simple No Yes,fullUser Auth No No Yes

Resource limit(cpu/mem/disk) Yes No No

Debug log mesos sandbox Manual web UI

Docker/Mesos實踐過程中需要注意的地方

health check with

Marathonon Mesos

{ "protocol": "COMMAND", "command": { "value": "curl -f -X GET http://$HOST:$PORT0/health" }, "gracePeriodSeconds": 300, "intervalSeconds": 60, "timeoutSeconds": 20, "maxConsecutiveFailures": 3}

{ "protocol": "COMMAND", "portIndex": 0, "command": { "value": "nc localhost 8119" }, "gracePeriodSeconds": 300, "intervalSeconds": 60, "timeoutSeconds": 20, "maxConsecutiveFailures": 3, "ignoreHttp1xx": false }

Marathon port resource --resources="ports(*):[8000-9000, 31000-32000]"

Dockerfile review規則Dockerfile必須 Code ReviewEverything in codebase: code/config禁止使用不穩定的wget/curl源Port資源必須申請並註冊

Q&A ?

ye.mikez@gmail.comzhangye@mvad.com

top related