YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Advanced Deployment

Advanced Deployment Scotland on Rails 2009

Jonathan Weiss, 28 March 2009

Peritor GmbH

Page 2: Advanced Deployment

2

Who am I?

Jonathan Weiss

•  Consultant for Peritor GmbH in Berlin

•  Specialized in Rails, Scaling, Deployment, and Code Review

•  Webistrano - Rails deployment tool

•  FreeBSD Rubygems and Ruby on Rails maintainer

http://www.peritor.com

http://blog.innerewut.de

Page 3: Advanced Deployment

3

Deployment

Architecture Process

Deployment

Page 4: Advanced Deployment

4

Deployment Process Requirements

Automatic Reproducible Accountable Notifications

Page 5: Advanced Deployment

5

Deployment Tools

Several tools available

•  Capistrano

•  Webistrano

•  Vlad

•  Puppet

•  Chef

The deployment process is usually not that complicated

Page 6: Advanced Deployment

6

Architecture

Page 7: Advanced Deployment

7

How deployment starts out …

Page 8: Advanced Deployment

8

… and how it ends

Page 9: Advanced Deployment

9

Agenda

Search

Background Processing

Scaling the database

Multiple Client Installations

Cloud Infrastructure

Page 10: Advanced Deployment

10

General Advice -

Simple is better than complex

Page 11: Advanced Deployment

11

Search

Page 12: Advanced Deployment

12

Search

Full text search

Can become very slow on big data sets

Page 13: Advanced Deployment

13

Full Text Search Engine

Separate Service

•  Creates full text index

•  Application queries search daemon

•  Index update through application or database

Possible Engines

•  Ferret

•  Sphinx

•  Solr

•  Lucene

•  …

Page 14: Advanced Deployment

14

Search Slave

Database replication slave

•  Has complete dataset

•  Migrates slow search queries from master

•  Can use different database table engine

Page 15: Advanced Deployment

15

Database Index

PostgreSQL Tsearch2

•  Core since 8.3

•  Allows to create full text index on multiple columns or arbitrary SQL expressions

MySQL MyISAM FULLTEXT index

•  Only works with MySQL <= 5.0 and MyISAM tables

•  Full text index on multiple columns

Page 16: Advanced Deployment

16

What to use?

Different characteristics

•  Real-time updates and stale data

•  Lost updates

•  Performance

•  Document content and format

•  Complexity

Page 17: Advanced Deployment

17

Background Processing

Page 18: Advanced Deployment

18

Problem

Long running tasks

•  Resizing uploaded images

•  Mailing

•  Computing an expensive operation

•  Accessing slow back-ends

When running inside request-response-cycle

•  Blocks user

•  Blocks Rails instance

•  Hard to monitor and debug

Page 19: Advanced Deployment

19

Solution

Asynchronous processing in the background

Message/Queue Scheduler

Page 20: Advanced Deployment

20

Background Processing

Page 21: Advanced Deployment

21

Options

Options for message bus:

•  Database

•  Amazon SQS

•  Drb

•  Memcache

•  ActiveMQ

•  …

Options for background process:

•  (Ruby) Daemon

•  Cron job with script/runner

•  Forked process

•  Delayed Job / BJ / (Backgroundrb)

•  run_later

•  ….

Page 22: Advanced Deployment

22

Database/Ruby daemon example

Page 23: Advanced Deployment

23

Scaling the database

Page 24: Advanced Deployment

24

Scaling the database

One database for everything

•  All domain data in one place

•  The simplest solution

Problems at some point

•  Number of read and write requests

•  Data size

Page 25: Advanced Deployment

25

Scaling the database

Read Slave

•  Slave replicates each SQL-statement on the master

•  Increase read performance by reading from replicating slave

•  Stale read problem

•  Better used explicitly, but then makes you think

Better use memcached

Page 26: Advanced Deployment

26

Scaling the database

Master-Master

•  Increase write and read performance

•  Each server is a slave of the other

•  Synchronization can be tricky

•  Limited by database size

Better for HA than for write performance

Page 27: Advanced Deployment

27

Data Partitioning

Partition on domain models

•  Separate users and products

•  Makes sense if JOINs are rare

•  Scales reads/writes

•  Reduces data size per database

•  Depends on separate domains

Simple and effective

Page 28: Advanced Deployment

28

Data Partitioning

Sharding

•  Split data into shards

•  All tables

•  Only big ones like users

•  Partition by id, hash function or lookup

•  Complex and makes JOINs complicated

•  Scales reads/writes

•  Reduces data size per database

Page 29: Advanced Deployment

29

Data Partitioning

Sharding

•  Split data into shards

•  All tables

•  Only big ones like users

•  Partition by id, hash function or lookup

•  Complex and makes JOINs complicated

•  Scales reads/writes

•  Reduces data size per database

Last resort

Page 30: Advanced Deployment

30

Alternatives

Data size is often the bigger problem

Reduce data size Archiving

Page 31: Advanced Deployment

31

Archiving

Get rid of (historical) data

•  Delete old data

•  Aggregate old data

•  Partition old data

Have an archiving policy from the start

Page 32: Advanced Deployment

32

Reduce data size

Avoid exponential data growth

•  Do not store data in database, move to

•  File system

•  S3

•  SimpleDB

•  Do not normalize data

•  Duplicate data in order to remove JOINs (and JOIN tables)

•  Combine indices

Page 33: Advanced Deployment

33

Multiple clients

Page 34: Advanced Deployment

34

Multiple Clients

NOT the same as multiple users

Client is more like a separate domain – i.e. expansion to another country

•  Different settings

•  Different themes

•  Different features enabled

•  Different language

•  Different audience

How to combine in one app?

Page 35: Advanced Deployment

35

Multiple Clients

Questions to ask

•  How many different clients?

•  Is there shared state (users, settings, posts, …)?

•  What is the expected data size and growth of each client?

Page 36: Advanced Deployment

36

Multiple Clients

The easy way to maintenance hell

•  Fork the code

•  One branch per client

•  One install per client

Page 37: Advanced Deployment

37

Multiple Clients

Same code – same database

•  Move different behavior into configuration

•  Move configuration into database

•  Scope data by DB-column

•  Scope all data request in the code

Page 38: Advanced Deployment

38

Multiple Clients

Same code – partition the data

•  Move different behavior into configuration

•  Partition data by database

Hardcode database while booting

Page 39: Advanced Deployment

39

Multiple Clients

Same code – partition the data

•  Move different behavior into configuration

•  Partition data by database

Choose database dynamically

Page 40: Advanced Deployment

40

Multiple Clients

Generate local databases

•  Import global content into master DB

•  Push shared content in the correct format to app DBs

•  Build reverse channel if needed

Page 41: Advanced Deployment

41

Cloud Infrastructure

Page 42: Advanced Deployment

42

Cloud Infrastructure

Servers come and go

•  You do not know your servers before deploying

•  Restarting is the same as introducing a new machine

You can’t hardcode IPs

database.yml

Page 43: Advanced Deployment

43

Solution #1

Query and manually adjust

•  Servers do not change that often

•  New nodes probably need manual intervention

•  Use AWS ElasticIPs to ease the pain

Set servers dynamically AWS Elastic IP

Page 44: Advanced Deployment

44

Solution #2

Use a central directory service

•  A central place to manage your running instances

•  Instances query the directory and react

Page 45: Advanced Deployment

45

Solution #2

Use a central directory service

•  A central place to manage your running instances

•  Instances query the directory and react

Page 46: Advanced Deployment

46

Central Directory

Different Implementations

•  File on S3

•  SimpleDB

•  A complete service, capable of monitoring and controlling your instances

Page 47: Advanced Deployment

47

Summary

Simple is better than complex

Carefully evaluate the different solutions

Only introduce a new component if you really need to

Everything has strings attached

Solving the data size problem often solves others too

Page 48: Advanced Deployment

48

Questions?

Page 49: Advanced Deployment

49

49

Peritor GmbH

Teutonenstraße 16 14129 Berlin

Telefon: +49 (0)30 69 20 09 84 0 Telefax: +49 (0)30 69 20 09 84 9

Internet: www.peritor.com E-Mail: [email protected]

Peritor GmbH - Alle Rechte vorbehalten


Related Documents