Top Banner
Zero-downtime Postgres upgrades Restarting databases without the apps noticing @ChrisSinjo
99

Zero Downtime Postgres Upgrades

Apr 11, 2017

Download

Engineering

Outlyer
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Zero Downtime Postgres Upgrades

Zero-downtime Postgres upgrades

Restarting databases without the apps noticing

@ChrisSinjo

Page 2: Zero Downtime Postgres Upgrades

GOCARDLESS

Page 3: Zero Downtime Postgres Upgrades

POST /cash/monies HTTP/1.1

{ amount: 100 }

💰💰💰

Page 4: Zero Downtime Postgres Upgrades

High 💵 per-request

Page 5: Zero Downtime Postgres Upgrades

Uptime is 🔑

Page 6: Zero Downtime Postgres Upgrades
Page 7: Zero Downtime Postgres Upgrades

Good durability guarantees

Page 8: Zero Downtime Postgres Upgrades

Good durability guarantees

Feature-cautious

Page 9: Zero Downtime Postgres Upgrades

Good durability guarantees

Feature-cautious

Transactions are cool

Page 10: Zero Downtime Postgres Upgrades

–Postgres

“Speak to this one node.”

Page 11: Zero Downtime Postgres Upgrades

Client

Postgres

Page 12: Zero Downtime Postgres Upgrades

Client

PostgresPostgresReplication

Page 13: Zero Downtime Postgres Upgrades

Client

PostgresPostgresReplication

Page 14: Zero Downtime Postgres Upgrades

Wake a human up

Page 15: Zero Downtime Postgres Upgrades

Client

PostgresPostgresReplication

Page 16: Zero Downtime Postgres Upgrades

Client

PostgresPostgres

Page 17: Zero Downtime Postgres Upgrades

Client

PostgresPostgres

Page 18: Zero Downtime Postgres Upgrades

Client

PostgresPostgresReplication

Page 19: Zero Downtime Postgres Upgrades

Awful time-to-recovery

Error-prone

Page 20: Zero Downtime Postgres Upgrades

You gotta perform:

- Many steps - In the right order - Perfectly

Page 21: Zero Downtime Postgres Upgrades

Don’t make a

tired SRE think

Page 22: Zero Downtime Postgres Upgrades

Add automation

Page 23: Zero Downtime Postgres Upgrades

Pacemaker

A clustering tool

Page 24: Zero Downtime Postgres Upgrades

Client

PostgresPostgresReplication

Page 25: Zero Downtime Postgres Upgrades

How do we know a node has failed?

Page 26: Zero Downtime Postgres Upgrades

Jepsenhttps://aphyr.com/tags/jepsen

Page 27: Zero Downtime Postgres Upgrades

https://aphyr.com/posts/317-jepsen-elasticsearch

Page 28: Zero Downtime Postgres Upgrades

Client

PostgresPostgresReplication

Page 29: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgresRepl Repl

Page 30: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres Repl Repl

Pacemaker Pacemaker Pacemaker

Page 31: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres Repl Repl

Pacemaker Pacemaker Pacemaker

VIP

Page 32: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres Repl Repl

Pacemaker Pacemaker Pacemaker

VIP

Page 33: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

VIP

Page 34: Zero Downtime Postgres Upgrades

PostgresPostgresPostgresRepl

Pacemaker Pacemaker Pacemaker

Client

VIP

Page 35: Zero Downtime Postgres Upgrades

PostgresPostgresPostgresRepl

Pacemaker Pacemaker Pacemaker

Client

VIP

Page 36: Zero Downtime Postgres Upgrades

PostgresPostgresPostgresRepl

Pacemaker Pacemaker Pacemaker

Client

VIP

Page 37: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres Repl

Repl

VIP

Pacemaker Pacemaker Pacemaker

Page 38: Zero Downtime Postgres Upgrades

$💯

Page 39: Zero Downtime Postgres Upgrades

Seems hard, right?

Page 40: Zero Downtime Postgres Upgrades

It kinda is

Page 41: Zero Downtime Postgres Upgrades

You gotta know:

- Postgres - Distributed systems - Pacemaker

Page 42: Zero Downtime Postgres Upgrades

Get someone else to run it for you

Page 43: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres Repl Repl

Pacemaker Pacemaker Pacemaker

VIP

Page 44: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

VIP

Page 45: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

VIP

Page 46: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

VIP

Page 47: Zero Downtime Postgres Upgrades

Every move means a connection reset

Page 48: Zero Downtime Postgres Upgrades

Every move means dropped requests

Page 49: Zero Downtime Postgres Upgrades

POST /cash/monies HTTP/1.1

{ amount: 100 }

💰💰💰

Page 50: Zero Downtime Postgres Upgrades

POST /cash/monies HTTP/1.1

{ amount: 100 }

500 Internal Server Error

Page 51: Zero Downtime Postgres Upgrades

What does this mean for upgrades?

Page 52: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

VIP

Page 53: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

9.4.9 9.4.9 9.4.9

VIP

Page 54: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

9.4.9 9.4.9 9.4.9

Repl Repl

VIP

Page 55: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

9.4.10 9.4.9 9.4.10

Repl Repl

VIP

Page 56: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres Repl

Repl

VIP

Pacemaker Pacemaker Pacemaker

9.4.10 9.4.9 9.4.10

Page 57: Zero Downtime Postgres Upgrades

Every upgrade means a connection reset

Page 58: Zero Downtime Postgres Upgrades

Every upgrade means dropped requests

Page 59: Zero Downtime Postgres Upgrades

POST /cash/monies HTTP/1.1

{ amount: 100 }

500 Internal Server Error

Page 60: Zero Downtime Postgres Upgrades

Solution: never upgrade

Page 61: Zero Downtime Postgres Upgrades

🙄

Page 62: Zero Downtime Postgres Upgrades

Not upgrading is

never an option

Page 63: Zero Downtime Postgres Upgrades

Solution: never upgrade

Page 64: Zero Downtime Postgres Upgrades

Solution: never upgrade

Page 65: Zero Downtime Postgres Upgrades

Solution: ???

Page 66: Zero Downtime Postgres Upgrades

1thing missing

Page 67: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

VIP

Page 68: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

PgBouncerPgBouncer PgBouncerVIP

Page 69: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

PgBouncerPgBouncer PgBouncerVIP

Page 70: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

PgBouncerPgBouncer PgBouncerVIP

VIP

Page 71: Zero Downtime Postgres Upgrades

PgBouncer has This One Weird Trick™

Page 72: Zero Downtime Postgres Upgrades

PAUSE;

Page 73: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

PgBouncerPgBouncer PgBouncerVIP

VIP

Page 74: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

PgBouncerPgBouncer PgBouncerVIP

VIP

PAUSE;

Page 75: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

PgBouncerPgBouncer PgBouncerVIP

PAUSE;

VIP

Page 76: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

PgBouncerPgBouncer PgBouncerVIP

PAUSE;

VIP

Page 77: Zero Downtime Postgres Upgrades

So what does this mean for upgrades?

Page 78: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

Pacemaker Pacemaker Pacemaker

PgBouncerPgBouncer PgBouncerVIP

VIP

Page 79: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

PgBouncerPgBouncer PgBouncerVIP

VIP

Page 80: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

PgBouncerPgBouncer PgBouncerVIP

VIP

9.4.10 9.4.9 9.4.10

Page 81: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

PgBouncerPgBouncer PgBouncerVIP

VIP

9.4.10 9.4.9 9.4.10

PAUSE;

Page 82: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

PgBouncerPgBouncer PgBouncerVIP

9.4.10 9.4.9 9.4.10

VIP

PAUSE;

Page 83: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

PgBouncerPgBouncer PgBouncerVIP

9.4.10 9.4.9 9.4.10

VIP

RESUME;

Page 84: Zero Downtime Postgres Upgrades

Client

PostgresPostgresPostgres

PgBouncerPgBouncer PgBouncerVIP

9.4.10 9.4.10 9.4.10

VIP

RESUME;

Page 85: Zero Downtime Postgres Upgrades

$💯

Page 86: Zero Downtime Postgres Upgrades

Caveats

Page 87: Zero Downtime Postgres Upgrades

Minor versions

Page 88: Zero Downtime Postgres Upgrades

9.4.9 → 9.4.10

Page 89: Zero Downtime Postgres Upgrades

pglogical

Page 90: Zero Downtime Postgres Upgrades

Minor versions

Long-running transactions

Page 91: Zero Downtime Postgres Upgrades

while(running_queries): if(now > timeout): abandon_migration else: sleep(0.1)

promote_new_primary

Page 92: Zero Downtime Postgres Upgrades

Minor versions

Long-running transactions

Pause length

Page 93: Zero Downtime Postgres Upgrades

7-10s total

Page 94: Zero Downtime Postgres Upgrades

$💯

Page 95: Zero Downtime Postgres Upgrades

One more thing… (#sorrynotsorry)

Page 96: Zero Downtime Postgres Upgrades

github.com/gocardless/our-postgresql-setup

Page 97: Zero Downtime Postgres Upgrades

We’re hiring✌❤

@ChrisSinjo @GoCardlessEng

Page 98: Zero Downtime Postgres Upgrades

Thank you✌❤

@ChrisSinjo @GoCardlessEng

Page 99: Zero Downtime Postgres Upgrades

Questions?✌❤

@ChrisSinjo @GoCardlessEng