Top Banner
Advanced Deployment Scotland on Rails 2009 Jonathan Weiss, 28 March 2009 Peritor GmbH
49

Advanced Deployment

Jan 15, 2015

Download

Technology

Jonathan Weiss

Advanced Deployment by Jonathan Weiss presented at Scotland on Rails 2009 in Edinburgh. Deployment and Scaling best practices. See more at http://scotlandonrails.com/schedule/28-march/advanced-deployment/
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Advanced Deployment

Advanced Deployment Scotland on Rails 2009

Jonathan Weiss, 28 March 2009

Peritor GmbH

Page 2: Advanced Deployment

2

Who am I?

Jonathan Weiss

•  Consultant for Peritor GmbH in Berlin

•  Specialized in Rails, Scaling, Deployment, and Code Review

•  Webistrano - Rails deployment tool

•  FreeBSD Rubygems and Ruby on Rails maintainer

http://www.peritor.com

http://blog.innerewut.de

Page 3: Advanced Deployment

3

Deployment

Architecture Process

Deployment

Page 4: Advanced Deployment

4

Deployment Process Requirements

Automatic Reproducible Accountable Notifications

Page 5: Advanced Deployment

5

Deployment Tools

Several tools available

•  Capistrano

•  Webistrano

•  Vlad

•  Puppet

•  Chef

The deployment process is usually not that complicated

Page 6: Advanced Deployment

6

Architecture

Page 7: Advanced Deployment

7

How deployment starts out …

Page 8: Advanced Deployment

8

… and how it ends

Page 9: Advanced Deployment

9

Agenda

Search

Background Processing

Scaling the database

Multiple Client Installations

Cloud Infrastructure

Page 10: Advanced Deployment

10

General Advice -

Simple is better than complex

Page 11: Advanced Deployment

11

Search

Page 12: Advanced Deployment

12

Search

Full text search

Can become very slow on big data sets

Page 13: Advanced Deployment

13

Full Text Search Engine

Separate Service

•  Creates full text index

•  Application queries search daemon

•  Index update through application or database

Possible Engines

•  Ferret

•  Sphinx

•  Solr

•  Lucene

•  …

Page 14: Advanced Deployment

14

Search Slave

Database replication slave

•  Has complete dataset

•  Migrates slow search queries from master

•  Can use different database table engine

Page 15: Advanced Deployment

15

Database Index

PostgreSQL Tsearch2

•  Core since 8.3

•  Allows to create full text index on multiple columns or arbitrary SQL expressions

MySQL MyISAM FULLTEXT index

•  Only works with MySQL <= 5.0 and MyISAM tables

•  Full text index on multiple columns

Page 16: Advanced Deployment

16

What to use?

Different characteristics

•  Real-time updates and stale data

•  Lost updates

•  Performance

•  Document content and format

•  Complexity

Page 17: Advanced Deployment

17

Background Processing

Page 18: Advanced Deployment

18

Problem

Long running tasks

•  Resizing uploaded images

•  Mailing

•  Computing an expensive operation

•  Accessing slow back-ends

When running inside request-response-cycle

•  Blocks user

•  Blocks Rails instance

•  Hard to monitor and debug

Page 19: Advanced Deployment

19

Solution

Asynchronous processing in the background

Message/Queue Scheduler

Page 20: Advanced Deployment

20

Background Processing

Page 21: Advanced Deployment

21

Options

Options for message bus:

•  Database

•  Amazon SQS

•  Drb

•  Memcache

•  ActiveMQ

•  …

Options for background process:

•  (Ruby) Daemon

•  Cron job with script/runner

•  Forked process

•  Delayed Job / BJ / (Backgroundrb)

•  run_later

•  ….

Page 22: Advanced Deployment

22

Database/Ruby daemon example

Page 23: Advanced Deployment

23

Scaling the database

Page 24: Advanced Deployment

24

Scaling the database

One database for everything

•  All domain data in one place

•  The simplest solution

Problems at some point

•  Number of read and write requests

•  Data size

Page 25: Advanced Deployment

25

Scaling the database

Read Slave

•  Slave replicates each SQL-statement on the master

•  Increase read performance by reading from replicating slave

•  Stale read problem

•  Better used explicitly, but then makes you think

Better use memcached

Page 26: Advanced Deployment

26

Scaling the database

Master-Master

•  Increase write and read performance

•  Each server is a slave of the other

•  Synchronization can be tricky

•  Limited by database size

Better for HA than for write performance

Page 27: Advanced Deployment

27

Data Partitioning

Partition on domain models

•  Separate users and products

•  Makes sense if JOINs are rare

•  Scales reads/writes

•  Reduces data size per database

•  Depends on separate domains

Simple and effective

Page 28: Advanced Deployment

28

Data Partitioning

Sharding

•  Split data into shards

•  All tables

•  Only big ones like users

•  Partition by id, hash function or lookup

•  Complex and makes JOINs complicated

•  Scales reads/writes

•  Reduces data size per database

Page 29: Advanced Deployment

29

Data Partitioning

Sharding

•  Split data into shards

•  All tables

•  Only big ones like users

•  Partition by id, hash function or lookup

•  Complex and makes JOINs complicated

•  Scales reads/writes

•  Reduces data size per database

Last resort

Page 30: Advanced Deployment

30

Alternatives

Data size is often the bigger problem

Reduce data size Archiving

Page 31: Advanced Deployment

31

Archiving

Get rid of (historical) data

•  Delete old data

•  Aggregate old data

•  Partition old data

Have an archiving policy from the start

Page 32: Advanced Deployment

32

Reduce data size

Avoid exponential data growth

•  Do not store data in database, move to

•  File system

•  S3

•  SimpleDB

•  Do not normalize data

•  Duplicate data in order to remove JOINs (and JOIN tables)

•  Combine indices

Page 33: Advanced Deployment

33

Multiple clients

Page 34: Advanced Deployment

34

Multiple Clients

NOT the same as multiple users

Client is more like a separate domain – i.e. expansion to another country

•  Different settings

•  Different themes

•  Different features enabled

•  Different language

•  Different audience

How to combine in one app?

Page 35: Advanced Deployment

35

Multiple Clients

Questions to ask

•  How many different clients?

•  Is there shared state (users, settings, posts, …)?

•  What is the expected data size and growth of each client?

Page 36: Advanced Deployment

36

Multiple Clients

The easy way to maintenance hell

•  Fork the code

•  One branch per client

•  One install per client

Page 37: Advanced Deployment

37

Multiple Clients

Same code – same database

•  Move different behavior into configuration

•  Move configuration into database

•  Scope data by DB-column

•  Scope all data request in the code

Page 38: Advanced Deployment

38

Multiple Clients

Same code – partition the data

•  Move different behavior into configuration

•  Partition data by database

Hardcode database while booting

Page 39: Advanced Deployment

39

Multiple Clients

Same code – partition the data

•  Move different behavior into configuration

•  Partition data by database

Choose database dynamically

Page 40: Advanced Deployment

40

Multiple Clients

Generate local databases

•  Import global content into master DB

•  Push shared content in the correct format to app DBs

•  Build reverse channel if needed

Page 41: Advanced Deployment

41

Cloud Infrastructure

Page 42: Advanced Deployment

42

Cloud Infrastructure

Servers come and go

•  You do not know your servers before deploying

•  Restarting is the same as introducing a new machine

You can’t hardcode IPs

database.yml

Page 43: Advanced Deployment

43

Solution #1

Query and manually adjust

•  Servers do not change that often

•  New nodes probably need manual intervention

•  Use AWS ElasticIPs to ease the pain

Set servers dynamically AWS Elastic IP

Page 44: Advanced Deployment

44

Solution #2

Use a central directory service

•  A central place to manage your running instances

•  Instances query the directory and react

Page 45: Advanced Deployment

45

Solution #2

Use a central directory service

•  A central place to manage your running instances

•  Instances query the directory and react

Page 46: Advanced Deployment

46

Central Directory

Different Implementations

•  File on S3

•  SimpleDB

•  A complete service, capable of monitoring and controlling your instances

Page 47: Advanced Deployment

47

Summary

Simple is better than complex

Carefully evaluate the different solutions

Only introduce a new component if you really need to

Everything has strings attached

Solving the data size problem often solves others too

Page 48: Advanced Deployment

48

Questions?

Page 49: Advanced Deployment

49

49

Peritor GmbH

Teutonenstraße 16 14129 Berlin

Telefon: +49 (0)30 69 20 09 84 0 Telefax: +49 (0)30 69 20 09 84 9

Internet: www.peritor.com E-Mail: [email protected]

Peritor GmbH - Alle Rechte vorbehalten