Top Banner
PostgreSQL John Paulett October 26, 2009 High Availability & Scaling
36

PostgreSQL Scaling And Failover

May 10, 2015

Download

Technology

John Paulett

Overview of PostgreSQL scaling and high availability options.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PostgreSQL Scaling And Failover

PostgreSQL

John Paulett

October 26, 2009

High Availability & Scaling

Page 2: PostgreSQL Scaling And Failover

10/26/2009 2

Overview

Scaling Overview– Horizontal & Vertical Options

High Availability Overview

Other Options

Suggested Architecture

Hardware Discussion

Page 3: PostgreSQL Scaling And Failover

10/26/2009 3

What are we trying to solve?

Survive server failure?– Support an uptime SLA (e.g. 99.9999%)?

Application scaling?– Support additional application demand

Page 4: PostgreSQL Scaling And Failover

10/26/2009 4

What are we trying to solve?

Survive server failure?– Support an uptime SLA (e.g. 99.9999%)?

Application scaling?– Support additional application demand

→ Many options, each optimized for different constraints

Page 5: PostgreSQL Scaling And Failover

10/26/2009 5

Scaling Overview

Page 6: PostgreSQL Scaling And Failover

10/26/2009 6

How To Scale

Horizontal Scaling– “Google” approach– Distribute load across multiple servers– Requires appropriate application architecture

Vertical Scaling– “Big Iron” approach– Single, massive machine (lots of fast processors,

RAM, & hard drives)

Page 7: PostgreSQL Scaling And Failover

10/26/2009 7

Horizontal DB Scaling

Load Balancing– Distribute operations to multiple servers

Partitioning– Cut up the data (horizontal) or tables (vertical)

and put them on separate servers– aka “sharding”

Page 8: PostgreSQL Scaling And Failover

10/26/2009 8

Basic Problem when Load Balancing

Difficult to maintain consistent state between servers (remember ACID), especially when dealing with writes

4 PostgreSQL Load Balancing Methods:– Master-Slave Replication– Statement-Based Replication Middleware– Asynchronous Multimaster Replication– Synchronous Multimaster Replication

Page 9: PostgreSQL Scaling And Failover

10/26/2009 9

Master-Slave Replication

Master handles writes, slaves handle reads

Asynchronous replication – Possible data loss on master failure

Slony-I– Does not automatically propagate schema changes – Does not offer single connection point– Requires separate solution for master failures

Page 10: PostgreSQL Scaling And Failover

10/26/2009 10

Statement-Based Replication Middleware

Intercept SQL queries, send writes to all servers, reads to any server

Possible issues using random(), CURRENT_TIMESTAMP, & sequences

pgpool-II– Connection Pooling, Replication, Load Balancing,

Parallel Queries, Failover

Page 11: PostgreSQL Scaling And Failover

10/26/2009 11

pgpool-II

Page 12: PostgreSQL Scaling And Failover

10/26/2009 12

Synchronous Multimaster Replication

Writes & reads on any server

Not implemented in PostgreSQL, but application code can mimic via two-phase commit

Page 13: PostgreSQL Scaling And Failover

10/26/2009 13

Load Balancing Issue

Scaling writes breaks down at a certain point

Page 14: PostgreSQL Scaling And Failover

10/26/2009 14

Partitioning

Requires heavy application modification

Performing queries across partitions is problematic (not possible)

PL/Proxy can help

Page 15: PostgreSQL Scaling And Failover

10/26/2009 15

Vertical DB Scaling

“Buying a bigger box is quick(ish). Redesigning software is not.”● Cal Henderson, Flickr

37 Signals Basecamp upgraded to 128 GB DB server: “don’t need to pay the complexity tax yet”● David Heinemeier Hansson, Ruby on Rails

Page 16: PostgreSQL Scaling And Failover

10/26/2009 16

Sites Running on Single DB

StackOverflow– MS SQL, 48GB RAM, RAID 1 OS, RAID 10 for data

37Signals Basecamp– MySQL, 128GB RAM. Dell R710 or Dell 2950

Page 17: PostgreSQL Scaling And Failover

10/26/2009 17

High Availability Overview

Page 18: PostgreSQL Scaling And Failover

10/26/2009 18

High Availability

Application still up even after node failure– (Also try to prevent failure with appropriate

hardware)

PostgreSQL High Availability Options– pg-pool – Shared Disk Failover– File System Replication– Warm Standby with Point-In-Time Recovery (PITR)

Often still need heartbeat application

Page 19: PostgreSQL Scaling And Failover

10/26/2009 19

Shared Disk Failover

Use single disk array to hold database's data files.

– Network Attached Storage (NAS)– Network File System (NFS)

Disk array is central point of failure

Need heartbeat to bring 2nd server online

Page 20: PostgreSQL Scaling And Failover

10/26/2009 20

File System Replication

File system is mirrored to another computer

DRDB– Linux filesystem replication

Need heartbeat to bring 2nd server online

Page 21: PostgreSQL Scaling And Failover

10/26/2009 21

Point in Time Recovery

“Log shipping”– Write Ahead Logs sent to and replayed on standby– Included in PostgreSQL 8.0+– Asynchronous - Potential loss of data

Warm Standby– Standbys' hardware very similar to primary's– Need heartbeat to bring 2nd server online

Page 22: PostgreSQL Scaling And Failover

10/26/2009 22

Heartbeat

“STONITH” (Shoot the Other Node In The Head)

– Prevent multiple nodes thinking they are the master

Linux-HA– Creates cluster, takes nodes out when they fail

Page 23: PostgreSQL Scaling And Failover

10/26/2009 23

Additional Options

Page 24: PostgreSQL Scaling And Failover

10/26/2009 24

Additional Options

Tune PostgreSQL– Defaults designed to “run anywhere”– pgbench, VACUUM/ANALYZE

Tune Queries– EXPLAIN

Caching (avoid the database)– memcached– Ehcache

Page 25: PostgreSQL Scaling And Failover

10/26/2009 25

Radical Additional Options

“NoSQL” database– CouchDB, MongoDB, HBase, Cassandra, Redis– Document store– Map/Reduce querying

Page 26: PostgreSQL Scaling And Failover

10/26/2009 26

Suggested Architecture

Page 27: PostgreSQL Scaling And Failover

10/26/2009 27

Current Production Setup

DB and Web server on same machine

No failover

Page 28: PostgreSQL Scaling And Failover

10/26/2009 28

Suggested Architecture

2 nice machines

Point in Time Recovery with Heartbeat

Tune PostgreSQL

Monitor & improve slow queries

Add in Ehcache as we touch code

→ Leave horizontal scaling for another day

Page 29: PostgreSQL Scaling And Failover

10/26/2009 29

Initial Architecture

High Availability

Page 30: PostgreSQL Scaling And Failover

10/26/2009 30

Future Architecture

Scale up application servers horizontally as needed

Improve DB Hardware

Page 31: PostgreSQL Scaling And Failover

10/26/2009 31

Hardware Options

PostgreSQL typically constrained by RAM & Disk IO, not processor

64-bit, as much memory as possible

Data Array– RAID10 with 4 drives (not RAID 5), 15k RPM

Separate OS Drive / Array

Page 32: PostgreSQL Scaling And Failover

10/26/2009 32

Dell R710

Processor: Xeon

4x 15k HD in RAID10

24GB (3x 8GB) RAM (up to 6x 16GB)

=$6,905

Page 33: PostgreSQL Scaling And Failover

10/26/2009 33

Other Considerations

Should have Test environment mimic Production

– Same database setup– Provides environment for experimentation

Can host multiple DBs on single cluster

Page 34: PostgreSQL Scaling And Failover

10/26/2009 34

References

http://37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding

http://37signals.com/svn/posts/1819-basecamp-now-with-more-vroom

http://anchor.com.au/hosting/dedicated/Tuning_PostgreSQL_on_your_Dedicated_Server

http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-log-shipping-with-pg-standby.html

http://blog.stackoverflow.com/2009/01/new-stack-overflow-servers-ready/

http://developer.postgresql.org/pgdocs/postgres/high-availability.html

http://developer.postgresql.org/pgdocs/postgres/pgbench.html

https://developer.skype.com/SkypeGarage/DbProjects/PlProxy

http://wiki.postgresql.org/wiki/Performance_Optimization

http://www.postgresql.org/docs/8.4/static/warm-standby.html

http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/

http://www.slony.info/

Page 35: PostgreSQL Scaling And Failover

10/26/2009 35

Additional Links

http://ehcache.org/

http://highscalability.com/skype-plans-postgresql-scale-1-billion-users

http://www.25hoursaday.com/weblog/2009/01/16/BuildingScalableDatabasesProsAndConsOfVariousDatabaseShardingSchemes.aspx

http://www.danga.com/memcached/

http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/

http://www.slideshare.net/iamcal/scalable-web-architectures-common-patterns-and-approaches-web-20-expo-nyc-presentation

Page 36: PostgreSQL Scaling And Failover

10/26/2009 36