Top Banner
ffunction inc. Fabric, Cuisine & Watchdog Sébastien Pierre, ffunction inc. @Montréal Python, February 2011 www.ffctn.com
170
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Fabric, Cuisine & Watchdog

Sébastien Pierre, ffunction inc.@Montréal Python, February 2011

www.ffctn.com

Page 2: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

How to use Python for

Server AdministrationThanks to

FabricCuisine*

& Watchdog**custom tools

Page 3: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The way we useservers

has changed

Page 4: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

WEBSERVER

The era of dedicated servers

DATABASESERVER

EMAILSERVER

Hosted in your server room or in colocation

Page 5: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

WEBSERVER

The era of dedicated servers

DATABASESERVER

EMAILSERVER

Hosted in your server room or in colocation

Sysadmins typicallySSH and configure

the servers live

Sysadmins typicallySSH and configure

the servers live

Page 6: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

WEBSERVER

The era of dedicated servers

DATABASESERVER

EMAILSERVER

Hosted in your server room or in colocation

The servers areconservatively managed,

updates are risky

The servers areconservatively managed,

updates are risky

Page 7: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

SLICE 1

The era of slices/VPS

SLICE 10

Linode.com

SLICE 11SLICE 9SLICE 1SLICE 1SLICE 1SLICE 1SLICE 6

Amazon Ec2

We now have multiplesmall virtual servers

(slices/VPS)

We now have multiplesmall virtual servers

(slices/VPS)

Page 8: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

SLICE 1

The era of slices/VPS

SLICE 10

Linode.com

SLICE 11SLICE 9SLICE 1SLICE 1SLICE 1SLICE 1SLICE 6

Amazon Ec2

Often located in differentdata-centers

Often located in differentdata-centers

Page 9: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

SLICE 1

The era of slices/VPS

SLICE 10

Linode.com

SLICE 11SLICE 9SLICE 1SLICE 1SLICE 1SLICE 1SLICE 6

Amazon Ec2

...and sometimes withdifferent providers

...and sometimes withdifferent providers

Page 10: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

SLICE 1

The era of slices/VPS

SLICE 10

Linode.com

SLICE 11SLICE 9SLICE 1SLICE 1SLICE 1SLICE 1SLICE 6

Amazon Ec2

DEDICATEDSERVER 1

DEDICATEDSERVER 2

IWeb.com

We even sometimesstill have physical,dedicated servers

We even sometimesstill have physical,dedicated servers

Page 11: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The challenge

ORDERSERVER

Page 12: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The challenge

ORDERSERVER

SETUPSERVER

Page 13: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The challenge

ORDERSERVER

SETUPSERVER

Create users, groupsCustomize config filesInstall base packages

Create users, groupsCustomize config filesInstall base packages

Page 14: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The challenge

ORDERSERVER

SETUPSERVER

DEPLOYAPPLICATION

Page 15: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The challenge

ORDERSERVER

SETUPSERVER

DEPLOYAPPLICATION

Install app-specificpackages

deploy applicationstart services

Install app-specificpackages

deploy applicationstart services

Page 16: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The challenge

ORDERSERVER

SETUPSERVER

DEPLOYAPPLICATION

MAKE THIS PROCESS AS FAST (AND SIMPLE)AS POSSIBLE

Page 17: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The challenge

Page 18: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The challenge

Quickly integrate yournew server in the

existing architecture

Quickly integrate yournew server in the

existing architecture

Page 19: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The challenge ...and make sureit's running!

...and make sureit's running!

Page 20: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Today's menu

FABRICInteract with your remote machinesas if they were local

Page 21: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Today's menu

FABRIC

CUISINE

Interact with your remote machinesas if they were local

Takes care of users, group, packagesand configuration of your new machine

Page 22: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Today's menu

FABRIC

CUISINE

WATCHDOG

Interact with your remote machinesas if they were local

Takes care of users, group, packagesand configuration of your new machine

Ensures that your servers and servicesare up and running

Page 23: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Today's menu

FABRIC

CUISINE

WATCHDOG

Interact with your remote machinesas if they were local

Takes care of users, group, packagesand configuration of your new machine

Ensures that your servers and servicesare up and running

Made byMade by

Page 24: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Part 1

Fabric - http://fabfile.org

application deployment & systems administration tasks

Page 25: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Fabric is a Python library and command-line tool

for streamlining the use of SSHfor application deployment

or systems administration tasks.

Page 26: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Fabric is a Python library and command-line tool

for streamlining the use of SSHfor application deployment

or systems administration tasks.

Wait... what doesthat mean ?

Wait... what doesthat mean ?

Page 27: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Streamlining SSH

version = os.popen(“ssh myserver 'cat /proc/version'”).read()

version = run(“cat /proc/version”)

By hand:

Using Fabric:

Page 28: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Streamlining SSH

version = os.popen(“ssh myserver 'cat /proc/version').read()

from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”)

By hand:

Using Fabric:

Page 29: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Streamlining SSH

version = os.popen(“ssh myserver 'cat /proc/version').read()

from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”)

By hand:

Using Fabric:

You can specify multiple hosts and runthe same commands

across them

You can specify multiple hosts and runthe same commands

across them

Page 30: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Streamlining SSH

version = os.popen(“ssh myserver 'cat /proc/version').read()

from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”)

By hand:

Using Fabric:

Connections will belazily created and

pooled

Connections will belazily created and

pooled

Page 31: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Streamlining SSH

version = os.popen(“ssh myserver 'cat /proc/version').read()

from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”)

By hand:

Using Fabric:

Failures ($STATUS) willbe handled just like in Make

Failures ($STATUS) willbe handled just like in Make

Page 32: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Example: Installing packages

sudo(“aptitude install nginx”)

if run("dpkg -s %s | grep 'Status:' ; true" % package).find("installed") == -1:

sudo("aptitude install '%s'" % (package)

Page 33: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Example: Installing packages

sudo(“aptitude install nginx”)

if run("dpkg -s %s | grep 'Status:' ; true" % package).find("installed") == -1:

sudo("aptitude install '%s'" % (package)

It's easy to take actiondepending on the result

It's easy to take actiondepending on the result

Page 34: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Example: Installing packages

sudo(“aptitude install nginx”)

if run("dpkg -s %s | grep 'Status:' ; true" % package).find("installed") == -1:

sudo("aptitude install '%s'" % (package)

Note that we add trueso that the run() always

succeeds** there are other ways...

Note that we add trueso that the run() always

succeeds** there are other ways...

Page 35: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Example: retrieving system status

disk_usage = run(“df -kP”)mem_usage = run(“cat /proc/meminfo”)cpu_usage = run(“cat /proc/stat”

print disk_usage, mem_usage, cpu_info

Page 36: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Example: retrieving system status

disk_usage = run(“df -kP”)mem_usage = run(“cat /proc/meminfo”)cpu_usage = run(“cat /proc/stat”

print disk_usage, mem_usage, cpu_info

Very useful for gettinglive information from

many different servers

Very useful for gettinglive information from

many different servers

Page 37: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Fabfile.py

from fabric.api import *from mysetup import *

env.host = [“server1.myapp.com”]

def setup(): install_packages(“...”) update_configuration() create_users() start_daemons()

$ fab setup

Page 38: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Fabfile.py

from fabric.api import *from mysetup import *

env.host = [“server1.myapp.com”]

def setup(): install_packages(“...”) update_configuration() create_users() start_daemons()

$ fab setup

Just like Make, youwrite rules that do

something

Just like Make, youwrite rules that do

something

Page 39: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Fabfile.py

from fabric.api import *from mysetup import *

env.host = [“server1.myapp.com”]

def setup(): install_packages(“...”) update_configuration() create_users() start_daemons()

$ fab setup

...and you can specifyon which servers the rules

will run

...and you can specifyon which servers the rules

will run

Page 40: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Multiple hosts

@hosts(“db1.myapp”)def backup_db():

run(...)

env.hosts = [“db1.myapp.com”,“db2.myapp.com”,“db3.myapp.com”

]

Page 41: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Roles

$ fab -R web setup

env.roledefs = { 'web': ['www1', 'www2', 'www3'], 'dns': ['ns1', 'ns2']}

Page 42: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Roles

$ fab -R web setup

env.roledefs = { 'web': ['www1', 'www2', 'www3'], 'dns': ['ns1', 'ns2']}

Will run the setup ruleonly on hosts members

of the web role.

Will run the setup ruleonly on hosts members

of the web role.

Page 43: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Some facts about Fabric

Fabric 1.0 just released!On March, 4th 2011

3 years of developmentFirst commit 1161 days ago (on March 10th, 2011)

Related ProjectsOpscode's Chef and Puppet

Page 44: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

What's good about Fabric?

Low-levelBasically an ssh() command that returns the result

Simple primitivesrun(), sudo(), get(), put(), local(), prompt(), reboot()

No magicNo DSL, no abstraction, just a remote command API

Page 45: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

What could be improved ?

Ease common admin tasksUser, group creation. Files, directory operations.

Abstract primitivesLike install package, so that it works with different OS

TemplatesTo make creating/updating configuration files easy

Page 46: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine:Chef-like functionality for Fabric

Page 47: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Part 2

Cuisine

Page 48: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

What is Opscode's Chef?

RecipesScripts/packages to install and configure services and applications

APIA DSL-like Ruby API to interact with the OS (create users, groups, install packages, etc)

ArchitectureClient-server or “solo” mode to push and deploy your new configurations

http://wiki.opscode.com/display/chef/Home

Page 49: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

What I liked about Chef

FlexibleYou can use the API or shell commands

StructuredHelped me have a clear decomposition of the services installed per machine

CommunityLots of recipes already available from http://cookbooks.opscode.com/

Page 50: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

What I didn't like

Too many files and directoriesCode is spread out, hard to get the big picture

Abstraction overloadAPI not very well documented, frequent fall backs to plain shell scripts within the recipe

No “smart” recipeRecipes are applied all the time, even when it's not necessary

Page 51: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The question that kept coming...

Django recipe: 5 files, 2 directories

sudo aptitude install apache2 python django-python

What it does, in essence

Page 52: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The question that kept coming...

Django recipe: 5 files, 2 directories

sudo aptitude install apache2 python django-python

What it does, in essence

Is this really necessaryfor what I want to do ?

Is this really necessaryfor what I want to do ?

Page 53: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

What I loved about Fabric

Bare metalssh() function, simple and elegant set of primitives

No magicNo abstraction, no model, no compilation

Two-way communicationEasy to change the rule's behaviour according to the output (ex: do not install something that's already installed)

Page 54: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

What I needed

Fabric

Page 55: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

What I needed

Fabric

File I/OFile I/O

Page 56: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

What I needed

Fabric

File I/OFile I/O User/GroupManagement

User/GroupManagement

Page 57: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

What I needed

Fabric

File I/OFile I/O PackageManagement

PackageManagement

User/GroupManagement

User/GroupManagement

Page 58: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

What I needed

Fabric

File I/OFile I/O PackageManagement

PackageManagement

User/GroupManagement

User/GroupManagement

Text processing & TemplatesText processing & Templates

Page 59: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

How I wanted it

Simple “flat” API[object]_[operation] where operation is something in “create”, “read”, “update”, “write”, “remove”, “ensure”, etc...

Driven by needOnly implement a feature if I have a real need for it

No magicEverything is implemented using sh-compatible commands

No unnecessary structureEverything fits in one file, no imposed file layout

Page 60: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Example fabfile.py

from cuisine import *

env.host = [“server1.myapp.com”]

def setup():package_ensure(“python”, “apache2”, “python-django”)user_ensure(“admin”, uid=2000)upstart_ensure(“django”)

$ fab setup

Page 61: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Example fabfile.py

from cuisine import *

env.host = [“server1.myapp.com”]

def setup():package_ensure(“python”, “apache2”, “python-django”)user_ensure(“admin”, uid=2000)upstart_ensure(“django”)

$ fab setup

Fabric's core functionsare already imported

Fabric's core functionsare already imported

Page 62: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Example fabfile.py

from cuisine import *

env.host = [“server1.myapp.com”]

def setup():package_ensure(“python”, “apache2”, “python-django”)user_ensure(“admin”, uid=2000)upstart_ensure(“django”)

$ fab setup Cuisine's APIcalls

Cuisine's APIcalls

Page 63: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

File I/O

Page 64: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine : File I/O

● file_exists does remote file exists?● file_read reads remote file● file_write write data to remote file● file_append appends data to remote file● file_attribs chmod & chown● file_remove

Page 65: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine : File I/O

● file_exists does remote file exists?● file_read reads remote file● file_write write data to remote file● file_append appends data to remote file● file_attribs chmod & chown● file_remove

Supports owner/groupand mode change

Supports owner/groupand mode change

Page 66: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine : File I/O (directories)

● dir_exists does remote file exists?● dir_ensure ensures that a directory exists● dir_attribs chmod & chown● dir_remove

Page 67: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine : File I/O +

● file_update(location, updater=lambda _:_)

package_ensure("mongodb-snapshot")def update_configuration( text ): res = [] for line in text.split("\n"): if line.strip().startswith("dbpath="): res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "\n".join(res)file_update("/etc/mongodb.conf", update_configuration)

Page 68: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine : File I/O +

● file_update(location, updater=lambda _:_)

package_ensure("mongodb-snapshot")def update_configuration( text ): res = [] for line in text.split("\n"): if line.strip().startswith("dbpath="): res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "\n".join(res)file_update("/etc/mongodb.conf", update_configuration)

This replaces the values forconfiguration entriesdbpath and logpath

This replaces the values forconfiguration entriesdbpath and logpath

Page 69: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine : File I/O +

● file_update(location, updater=lambda _:_)

package_ensure("mongodb-snapshot")def update_configuration( text ): res = [] for line in text.split("\n"): if line.strip().startswith("dbpath="): res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "\n".join(res)file_update("/etc/mongodb.conf", update_configuration)

The remote file will only bechanged if the content

is different

The remote file will only bechanged if the content

is different

Page 70: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

User Management

Page 71: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: User Management

● user_exists does the user exists?● user_create create the user● user_ensure create the user if it doesn't exist

Page 72: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Group Management

● group_exists does the group exists?● group_create create the group● group_ensure create the group if it doesn't exist● group_user_exists does the user belong to the group?● group_user_add adds the user to the group● group_user_ensure

Page 73: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Package Management

Page 74: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Package Management

● package_exists is the package available ?● package_installed is it installed ?● package_install install the package● package_ensure ... only if it's not installed● package_upgrade upgrades the/all package(s)

Page 75: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Text & Templates

Page 76: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Text transformation

text_ensure_line(text, lines)

file_update("/home/user/.profile", lambda _:text_ensure_line(_,

"PYTHONPATH=/opt/lib/python:${PYTHONPATH};""export PYTHONPATH"

))

Page 77: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Text transformation

text_ensure_line(text, lines)

file_update("/home/user/.profile", lambda _:text_ensure_line(_,

"PYTHONPATH=/opt/lib/python:${PYTHONPATH};""export PYTHONPATH"

))

Ensures that the PYTHONPATHvariable is set and exported,

If not, these lines will beappended.

Ensures that the PYTHONPATHvariable is set and exported,

If not, these lines will beappended.

Page 78: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Text transformation

text_replace_line(text, old, new, find=.., process=...)

configuration = local_read("server.conf")for key, value in variables.items():

configuration, replaced = text_replace_line(configuration,key + "=",key + "=" + repr(value),process=lambda text:text.split("=")[0].strip()

)

Page 79: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Text transformation

text_replace_line(text, old, new, find=.., process=...)

configuration = local_read("server.conf")for key, value in variables.items():

configuration, replaced = text_replace_line(configuration,key + "=",key + "=" + repr(value),process=lambda text:text.split("=")[0].strip()

)

Replaces lines that look likeVARIABLE=VALUE

with the actual values from thevariables dictionary.

Replaces lines that look likeVARIABLE=VALUE

with the actual values from thevariables dictionary.

Page 80: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Text transformation

text_replace_line(text, old, new, find=.., process=...)

configuration = local_read("server.conf")for key, value in variables.items():

configuration, replaced = text_replace_line(configuration,key + "=",key + "=" + repr(value),process=lambda text:text.split("=")[0].strip()

)

The process lambda transformsinput lines before comparing

them.

Here the lines are strippedof spaces and of their value.

The process lambda transformsinput lines before comparing

them.

Here the lines are strippedof spaces and of their value.

Page 81: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Text transformation

text_strip_margin(text)

file_write(".profile", text_strip_margin("""|export PATH="$HOME/bin":$PATH|set -o vi"""

))

Page 82: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Text transformation

text_strip_margin(text)

file_write(".profile", text_strip_margin("""|export PATH="$HOME/bin":$PATH|set -o vi"""

))

Everything after the | separatorwill be output as content.

It allows to easily embed texttemplates within functions.

Everything after the | separatorwill be output as content.

It allows to easily embed texttemplates within functions.

Page 83: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Text transformation

text_template(text, variables)

text_template(text_strip_margin("""|cd ${DAEMON_PATH}|exec ${DAEMON_EXEC_PATH}"""

), dict(DAEMON_PATH="/opt/mongodb",DAEMON_EXEC_PATH="/opt/mongodb/mongod"

))

Page 84: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Text transformation

text_template(text, variables)

text_template(text_strip_margin("""|cd ${DAEMON_PATH}|exec ${DAEMON_EXEC_PATH}"""

), dict(DAEMON_PATH="/opt/mongodb",DAEMON_EXEC_PATH="/opt/mongodb/mongod"

))

This is a simple wrapperaround Python (safe)

string.template() function

This is a simple wrapperaround Python (safe)

string.template() function

Page 85: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine: Goodies

● ssh_keygen generates DSA keys

● ssh_authorize authorizes your key on the remote server

● mode_sudo run() always uses sudo

● upstart_ensure ensures the given daemon is running

& more!

Page 86: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine Tips: Structuring your rules

BOOTSTRAP

Page 87: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine Tips: Structuring your rules

BOOTSTRAP

You just received your newVPS, and you want to set itup so that you have a basesystem that you can accesswithout typing a password

You just received your newVPS, and you want to set itup so that you have a basesystem that you can accesswithout typing a password

Page 88: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine Tips: Structuring your rules

BOOTSTRAP SETUP

Page 89: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine Tips: Structuring your rules

BOOTSTRAP SETUP

You install your users, groups,preferred packages andconfiguration. You alsoinstall you applications.

You install your users, groups,preferred packages andconfiguration. You alsoinstall you applications.

Page 90: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine Tips: Structuring your rules

BOOTSTRAP SETUP UPDATE

Page 91: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine Tips: Structuring your rules

BOOTSTRAP SETUP UPDATE

You want to deploy the newversion of the application

you just built

You want to deploy the newversion of the application

you just built

Page 92: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine Tips: Structuring your rules

BOOTSTRAP SETUP UPDATE

def bootstrap():# Secure SSH, create admin user# Authorize SSH public keys# Remove unwanted packages

Page 93: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine Tips: Structuring your rules

BOOTSTRAP SETUP UPDATE

def setup():# Create directories (ex: /opt/data, /opt/services, etc)# Create user/groups (ex: apps, services, etc)# Install base tools (ex: screen, fail2ban, zsh, etc)# Edit configuration (ex: profile, inputrc, etc)# Install and run your application

Page 94: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Cuisine Tips: Structuring your rules

BOOTSTRAP SETUP UPDATE

def update():# Download your application update# Freeze/stop the running application# Install the update# Reload/restart your application# Test that everything is OK

Page 95: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Why use Cuisine ?

● Simple API for remote-server manipulationFiles, users, groups, packages

● Shell commands for specific tasks onlyAvoid problems with your shell commands by only using run() for very specific tasks

● Cuisine tasks are not stupid*_ensure() commands won't do anything if it's not necessary

Page 96: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Limitations

● Limited to sh-shellsOperations will not work under csh

● Only written/tested for Ubuntu LinuxContributors could easily port commands

Page 97: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Get started !

On Github:http://github.com/sebastien/cuisine

1 short Python fileDocumented API

Page 98: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Part 3

Watchdog

Server and services monitoring

Page 99: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The problem

Page 100: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The problem

Low disk spaceLow disk space

Page 101: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The problem

Archive filesRotate logs

Purge cache

Archive filesRotate logs

Purge cache

Page 102: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The problem HTTP serverhas highlatency

HTTP serverhas highlatency

Page 103: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The problemRestart HTTP

server

Restart HTTPserver

Page 104: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The problem

System loadis too high

System loadis too high

Page 105: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The problem

re-niceimportantprocesses

re-niceimportantprocesses

Page 106: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

We want to be notifiedwhen problems occur

Page 107: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

We want automatic actions to be taken whenever possible

Page 108: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

(Some of the) existing solutions

Monit, God, Supervisord, UpstartFocus on starting/restarting daemons and services

Munin, CactiFocus on visualization of RRDTool data

CollectdFocus on collecting and publishing data

Page 109: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

The ideal tool

Wide spectrumData collection, service monitoring, actions

Easy setup and deploymentNo complex installation or configuration

Flexible server architectureCan monitor local or remote processes

Customizable and extensibleFrom restarting deamons to monitoring whole servers

Page 110: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Hello, Watchdog!

SERVICE

Page 111: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Hello, Watchdog!

RULE

SERVICE

Page 112: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Hello, Watchdog!

RULE

SERVICE

A service is acollection of

RULES

A service is acollection of

RULES

Page 113: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Hello, Watchdog!

RULE

SERVICE

HTTP RequestCPU, Disk, Mem %Process statusI/O Bandwidth

Page 114: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Hello, Watchdog!

RULE

SERVICE

HTTP RequestCPU, Disk, Mem %Process statusI/O Bandwidth

Each rule retrievesdata and processes it.Rules can SUCCEED

or FAIL

Each rule retrievesdata and processes it.Rules can SUCCEED

or FAIL

Page 115: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Hello, Watchdog!

RULE

ACTION

SERVICE

HTTP RequestCPU, Disk, Mem %Process statusI/O Bandwidth

Page 116: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Hello, Watchdog!

RULE

ACTION

SERVICE

HTTP RequestCPU, Disk, Mem %Process statusI/O Bandwidth

LoggingXMPP, Email notificationsStart/stop process….

Page 117: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Hello, Watchdog!

RULE

ACTION

SERVICE

HTTP RequestCPU, Disk, Mem %Process statusI/O Bandwidth

LoggingXMPP, Email notificationsStart/stop process….

Actions are boundto rule, triggeredon rule SUCCESS

or FAILURE

Actions are boundto rule, triggeredon rule SUCCESS

or FAILURE

Page 118: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Execution Model

MONITOR

Page 119: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

SERVICE DEFINITION

Page 120: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

SERVICE DEFINITION

Services are registeredin the monitor

Services are registeredin the monitor

Page 121: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

SERVICE DEFINITION

Rules defined in theservice are executed

every N ms(frequency)

Rules defined in theservice are executed

every N ms(frequency)

Rules defined in theservice are executed

every N ms(frequency)

Rules defined in theservice are executed

every N ms(frequency)

Page 122: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

ACTION

ACTION

ACTION

SERVICE DEFINITION

SUCCESS FAILURE

Page 123: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

ACTION

ACTION

ACTION

SERVICE DEFINITION

If the rule SUCCEEDSactions will be

sequentially executed

If the rule SUCCEEDSactions will be

sequentially executed

SUCCESS FAILURE

Page 124: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Execution Model

MONITORRULE

(frequency in ms)

ACTION

ACTION

ACTION

SERVICE DEFINITION

If the rule FAILfailure actions will besequentially executed

If the rule FAILfailure actions will besequentially executed

SUCCESS FAILURE

Page 125: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

Page 126: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

A monitor is like the“main” for Watchdog.

It actively monitorsservices.

A monitor is like the“main” for Watchdog.

It actively monitorsservices.

Page 127: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

Don't forget to callrun() on it

Don't forget to callrun() on it

Page 128: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

The service monitorsthe rules

The service monitorsthe rules

Page 129: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

The HTTP ruleallows to test

an URL

The HTTP ruleallows to test

an URL

And we display amessage in case

of failure

And we display amessage in case

of failure

Page 130: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring a remote machine

#!/usr/bin/env pythonfrom watchdog import *Monitor(

Service(name = "google-search-latency",monitor = (

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Print("Google search query took more than 50ms")]

))

)).run()

If it there is a 4XX orit timeouts, the rulewill fail and displayan error message

If it there is a 4XX orit timeouts, the rulewill fail and displayan error message

Page 131: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring a remote machine

$ python example-service-monitoring.py

2011-02-27T22:33:18 watchdog --- #0 (runners=1,threads=2,duration=0.57s)2011-02-27T22:33:18 watchdog [!] Failure on HTTP(GET="www.google.ca:80/search?q=watchdog",timeout=0.08) : Socket error: timed outGoogle search query took more than 50ms2011-02-27T22:33:19 watchdog --- #1 (runners=1,threads=2,duration=0.73s)2011-02-27T22:33:20 watchdog --- #2 (runners=1,threads=2,duration=0.54s)2011-02-27T22:33:21 watchdog --- #3 (runners=1,threads=2,duration=0.69s)2011-02-27T22:33:22 watchdog --- #4 (runners=1,threads=2,duration=0.77s)2011-02-27T22:33:23 watchdog --- #5 (runners=1,threads=2,duration=0.70s)

Page 132: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Sending Email Notification

send_email = Email("[email protected]","[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword"

)

[…]HTTP(

GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

send_email]

)

Page 133: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Sending Email Notification

send_email = Email("[email protected]","[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword"

)

[…]HTTP(

GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

send_email]

)

The Email rule will sendan email to

[email protected] triggered

The Email rule will sendan email to

[email protected] triggered

Page 134: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Sending Email Notification

send_email = Email("[email protected]","[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword"

)

[…]HTTP(

GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

send_email]

)

This is how we bind theaction to the rule failure

This is how we bind theaction to the rule failure

Page 135: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Sending Email+Jabber Notification

send_xmpp = XMPP("[email protected]","Watchdog: Google search latency over 80ms","[email protected]", "myspassword"

)

[…]HTTP(

GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

send_email, send_xmpp]

)

Page 136: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring incident: when something fails repeatedly during a given period of

time

Page 137: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring incident: when something fails repeatedly during a given period of

time

You don't want to benotified all the time,only when it really

matters.

You don't want to benotified all the time,only when it really

matters.

Page 138: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Detecting incidents

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Incident(errors = 5,during = Time.s(10),actions = [send_email,send_xmpp]

)]

)

Page 139: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Detecting incidents

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Incident(errors = 5,during = Time.s(10),actions = [send_email,send_xmpp]

)]

)

An incident is a “smart”action : it will only dosomething when the

condition is met

An incident is a “smart”action : it will only dosomething when the

condition is met

Page 140: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Detecting incidents

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Incident(errors = 5,during = Time.s(10),actions = [send_email,send_xmpp]

)]

)

When at least 5 errors...When at least 5 errors...

Page 141: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Detecting incidents

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Incident(errors = 5,during = Time.s(10),actions = [send_email,send_xmpp]

)]

)

...happen over a 10seconds period

...happen over a 10seconds period

Page 142: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Detecting incidents

HTTP(GET="http://www.google.ca/search?q=watchdog",freq=Time.s(1),timeout=Time.ms(80),fail=[

Incident(errors = 5,during = Time.s(10),actions = [send_email,send_xmpp]

)]

)

The Incident action willtrigger the given actions

The Incident action willtrigger the given actions

Page 143: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Example: Ensuring a service is running

from watchdog import *Monitor(

Service(name="myservice-ensure-up",monitor=(

HTTP(GET="http://localhost:8000/",freq=Time.ms(500),fail=[

Incident(errors=5,during=Time.s(5),actions=[

Restart("myservice-start.py")])])))).run()

Page 144: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Example: Ensuring a service is running

from watchdog import *Monitor(

Service(name="myservice-ensure-up",monitor=(

HTTP(GET="http://localhost:8000/",freq=Time.ms(500),fail=[

Incident(errors=5,during=Time.s(5),actions=[

Restart("myservice-start.py")])])))).run()

We test if we canGET http://localhost:8000

within 500ms

We test if we canGET http://localhost:8000

within 500ms

Page 145: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Example: Ensuring a service is running

from watchdog import *Monitor(

Service(name="myservice-ensure-up",monitor=(

HTTP(GET="http://localhost:8000/",freq=Time.ms(500),fail=[

Incident(errors=5,during=Time.s(5),actions=[

Restart("myservice-start.py")])])))).run()

If we can't reach it during5 seconds

If we can't reach it during5 seconds

Page 146: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Example: Ensuring a service is running

from watchdog import *Monitor(

Service(name="myservice-ensure-up",monitor=(

HTTP(GET="http://localhost:8000/",freq=Time.ms(500),fail=[

Incident(errors=5,during=Time.s(5),actions=[

Restart("myservice-start.py")])])))).run()

We kill and restartmyservice-start.py

We kill and restartmyservice-start.py

Page 147: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Example: Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

Page 148: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

Page 149: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

SystemInfo will retrievesystem information andreturn it as a dictionary

SystemInfo will retrievesystem information andreturn it as a dictionary

Page 150: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

We log each result byextracting the given

value from the resultdictionary (memoryUsage,

diskUsage,cpuUsage)

We log each result byextracting the given

value from the resultdictionary (memoryUsage,

diskUsage,cpuUsage)

Page 151: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda v:v["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

Bandwidth collectsnetwork interface

live traffic information

Bandwidth collectsnetwork interface

live traffic information

Page 152: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda _:_["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

But we don't want thetotal amount, we justwant the difference.Delta does just that.

But we don't want thetotal amount, we justwant the difference.Delta does just that.

Page 153: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda _:_["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent=")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

We print the resultas before

We print the resultas before

Page 154: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda _:_["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent=")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

SystemHealth willfail whenever the usage

is above the giventhresholds

SystemHealth willfail whenever the usage

is above the giventhresholds

Page 155: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Monitoring system health

from watchdog import *Monitor (

Service(name = "system-health",monitor = (

SystemInfo(freq=Time.s(1),success = (

LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]),LogResult("myserver.system.disk=", extract=lambda

r,_:reduce(max,r["diskUsage"].values())),LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]),

)),Delta(

Bandwidth("eth0", freq=Time.s(1)),extract = lambda _:_["total"]["bytes"]/1000.0/1000.0,success = [LogResult("myserver.system.eth0.sent=")]

),SystemHealth(

cpu=0.90, disk=0.90, mem=0.90,freq=Time.s(60),fail=[Log(path="watchdog-system-failures.log")]

),)

)).run()

We'll log failuresin a log file

We'll log failuresin a log file

Page 156: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Decentralized architecture

APPSERVER

STATIC FILESERVER

DB SERVERSERVER

Page 157: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Decentralized architecture

APPSERVER

W

STATIC FILESERVER

DB SERVERSERVER

Ensures the App isrunning

(pid & HTTP test)

Ensures the App isrunning

(pid & HTTP test)

Page 158: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Decentralized architecture

APPSERVER

W

STATIC FILESERVER

W

DB SERVERSERVER

Ensures the static fileserver is running

an has lowlatency

Ensures the static fileserver is running

an has lowlatency

Page 159: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Decentralized architecture

APPSERVER

W

STATIC FILESERVER

W

DB SERVERSERVER

W

Ensures the DB isrunning and that

queriesare not too slow.

Ensures the DB isrunning and that

queriesare not too slow.

Page 160: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Centralized Architecture

APPSERVER

STATIC FILESERVER

DB SERVERSERVER

Page 161: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Centralized Architecture

APPSERVER

STATIC FILESERVER

DB SERVERSERVER

PLATFORMSERVER

Page 162: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Centralized Architecture

APPSERVER

STATIC FILESERVER

DB SERVERSERVER

PLATFORMSERVER

W

Does high-level (HTTP,SQL) queries on theservers and execute

actions remotelywhen problems

are detected

Does high-level (HTTP,SQL) queries on theservers and execute

actions remotelywhen problems

are detected

Page 163: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Deploying on Ubuntu

UPSTART!UPSTART!

Page 164: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Deploying on Ubuntu

# upstart - Watchdog Configuration File# =====================================# updated: 2011-02-28

description "Watchdog - service monitoring daemon"author "Sebastien Pierre <[email protected]>"

start on (net-device-up and local-filesystems)stop on runlevel [016]

respawn

script # NOTE: Change this to wherever the watchdog is installed WATCHDOG_HOME=/opt/services/watchdog cd $WATCHDOG_HOME # NOTE: Change this to wherever your custom watchdog script is installed python watchdog.pyend script

console output# EOF

Page 165: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Deploying on Ubuntu

# upstart - Watchdog Configuration File# =====================================# updated: 2011-02-28

description "Watchdog - service monitoring daemon"author "Sebastien Pierre <[email protected]>"

start on (net-device-up and local-filesystems)stop on runlevel [016]

respawn

script # NOTE: Change this to wherever the watchdog is installed WATCHDOG_HOME=/opt/services/watchdog cd $WATCHDOG_HOME # NOTE: Change this to wherever your custom watchdog script is installed python watchdog.pyend script

console output# EOF

Save this file as/etc/init/watchdog.conf

Save this file as/etc/init/watchdog.conf

Page 166: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Overview

Monitoring DSLDeclarative programming to define monitoring strategy

Wide spectrumFrom data collection to incident detection

FlexibleDoes not impose a specific architecture

Page 167: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: Use cases

Ensure service availabilityTest and stop/restart when problems

Collect system statisticsLog or send data through the network

Alert on system or service healthTake actions when the system stats is above threshold

Page 168: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Watchdog: What's coming?

ZeroMQ channelsData streaming and inter-watchdog comm.

DocumentationOnly the basics, need more love!

Contributors?Codebase is small and clear, start hacking!

Page 169: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Get started !

On Github:http://github.com/sebastien/watchdog

1 Python fileDocumented API

Page 170: Server Administration in Python with Fabric, Cuisine and Watchdog

ffunctioninc.

Merci !

[email protected]/sebastien