Top Banner
Copyright©2017 NTT corp. All Rights Reserved. PostgreSQL Replication 2.0 NTT OSS Center Masahiko Sawada PGConf.ASIA 2017
48

PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

Aug 01, 2018

Download

Documents

vankien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

Copyright©2017 NTT corp. All Rights Reserved.

PostgreSQL Replication 2.0

NTT OSS CenterMasahiko Sawada

PGConf.ASIA 2017

Page 2: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

2Copyright©2017 NTT Corp. All Rights Reserved.

Who am I

Masahiko Sawada@sawada_masahiko

NTT Open Source Software Center

PostgreSQL contributor

PostgreSQL technical support

Maintenance of PostgreSQL related tools

Will talk tomorrow againPostgreSQL Built-in Sharding

— Enabling big data management with the blue elephant —

Page 3: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

3Copyright©2017 NTT Corp. All Rights Reserved.

POSTGRESQL REPLICATION IS AWESOME!

Page 4: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

4Copyright©2017 NTT Corp. All Rights Reserved.

• What is Database Replication?

• The History of PostgreSQL Replication

• Logical Replication has came

• Summary

Index

Page 5: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

5Copyright©2017 NTT Corp. All Rights Reserved.

WHAT IS DATABASE REPLICATION?

Page 6: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

6Copyright©2017 NTT Corp. All Rights Reserved.

• Keeping a copy of the data on multiple machines• Continue working even if some of its parts have failed• Keep data geographically close to your users• Scale out the number of machines that can serve read queries

• Master and Standby• Primary and Slave, Leader and Follower

• Replication Topology

What is Database Replication?

Page 7: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

7Copyright©2017 NTT Corp. All Rights Reserved.

What to Replicate

UPDATE tbl

SET price = 100

WHERE id = ‘ABC000’;

Log shipping97d0 0700 1800 ef120300 55b1 0300 54b10300 0000 0000 0000 ...

Statement-based“UPDATE tblSET price = 100 WHERE id = ‘ABC000’;” Row-based

Table : tblKey : id = ‘ABC000’Row : id = ‘ABC000’,

price = 100

Master

Standby

Client

Page 8: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

8Copyright©2017 NTT Corp. All Rights Reserved.

THE HISTORY OF POSTGRESQL REPLICATION

Page 9: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

9Copyright©2017 NTT Corp. All Rights Reserved.

PostgreSQL Replication is Awesome

Replication 1.0

• Streaming replication

• Asynchronous replication

• Synchronous replication

• Cascading replication

Replication 2.0

• Logical replication

• Multi-master replication

etc

Page 10: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

10Copyright©2017 NTT Corp. All Rights Reserved.

• Streaming (physical) replication has been introduced

• Log (Write-Ahead Log) shipping

• Build an exactly same database cluster

• Single-master, multi-slaves

Backing to 2008..

Read and WriteRead-only

Master

Standbys (async)

Client

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 11: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

11Copyright©2017 NTT Corp. All Rights Reserved.

Basic Architecture of Streaming Replication

WAL WAL

Write WAL

Send WAL

COMMIT

Write WAL

OK

Apply WAL

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

• Master server sends WAL, standby server receives it

• Standby server continues to receive and apply the received WAL

• Standby server can promote to a new master server

Page 12: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

12Copyright©2017 NTT Corp. All Rights Reserved.

backend

Basic Architecture in detail

WAL WAL

backend wal sender wal receiver startup

Table Table

1. Write

1. Modify

3. Read

4. Send

OK(LSNs)

6. Notify

5. Write 7. Read

8. Apply

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

• Wal sender process sends WAL to wal receiver process• The standby server is doing the archive recovery• In asynchronous replication, backend returns OK to client after step2

2. Notify

Page 13: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

13Copyright©2017 NTT Corp. All Rights Reserved.

Asynchronous replication

• Send WAL to standby server asynchronously

• Low overhead

• Commit does NOT wait to be replicated to the standby server

• Committed data could get loss on standby server

data change OK

Time

Client

Master

Standby

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

OKdata change

Page 14: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

14Copyright©2017 NTT Corp. All Rights Reserved.

Asynchronous replication

OK

Time

Client

Master

Standby

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

oldvalue

data change

read

Become a new master

• Send WAL to standby server asynchronously

• Low overhead

• Commit does NOT wait to be replicated to the standby server

• Committed data could get loss on standby server

Page 15: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

15Copyright©2017 NTT Corp. All Rights Reserved.

• “hot_standby = on” on standby server

• Enable to issuing READ SQL to a standby server

• For read balancing

• Note that the result on standby servers might be old

Read Replica (hot standby)

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 16: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

16Copyright©2017 NTT Corp. All Rights Reserved.

• Conflict between queries on standby server and streaming replication

• DROP TABLE and SELECT

• Vacuum cleanup and SELECT

• Access Exclusive Lock and SELECT

• etc

• GUC Parameters on master server side

• vacuum_defer_cleanup_age

• GUC parameters on standby server side

• hot_standby_feedback

• max_standby_archive_delay

• max_standby_streaming_delay

Handling Query Conflicts

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 17: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

17Copyright©2017 NTT Corp. All Rights Reserved.

• Only one standby can be “synchronous” standby

• Others are asynchronous standby

• synchronous_standby_names = ‘server1, server2’

Synchronous Replication

Master

Standby (sync)

Standbys (async)

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 18: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

18Copyright©2017 NTT Corp. All Rights Reserved.

Synchronous Replication

• Commit waits for data to be replicated to the standby server

• When transaction commit, it guaranteed that the data is written on both the master server and the standby server

data change

OK

OK

Time

Client

Master

Standby

waiting for standby

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

flush

data change

Page 19: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

19Copyright©2017 NTT Corp. All Rights Reserved.

• Taking a whole database cluster

• Easy to set up standby server

• pg_basebackup uses replication connection• connect to wal sender

• The master server needs to allow replication connection

• Consume max_wal_senders

pg_basebackup

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 20: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

20Copyright©2017 NTT Corp. All Rights Reserved.

• A standby server has standby servers

Cascading Replication

Master

Standby (sync)

Standbys (async)

Cascading Standbys (async)

Cascading Standbys (async)

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 21: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

21Copyright©2017 NTT Corp. All Rights Reserved.

• Required WAL might be archived on the master server while disconnection

• “FATAL: could not receive data from WAL stream ERROR: requested WAL segment 000000010000000000000007 has already been removed”

• Solutions• restore_command = ‘scp hostname:/path/to/%f %p’

• What happen if archived WAL also has been removed?

• wal_keep_segments

• What happen if many WAL are generated much more than estimate?

An old problem of replication

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 22: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

22Copyright©2017 NTT Corp. All Rights Reserved.

• Ensure that the master doesn’t remove WAL segments until they have been received

• Using streaming replication on a replication slot• primary_slot_name in recovery.conf

• Note that dangling replication slots could be cause of disk full

Replication Slots

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 23: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

23Copyright©2017 NTT Corp. All Rights Reserved.

• Needed full base backup after fail-over to bring old master server back online

• Take a long time if database is very large

Bringing old master online after fail-over

Master Standby

New Master

Replication Replication

full backup

Master

MasterStandby

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 24: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

24Copyright©2017 NTT Corp. All Rights Reserved.

• Synchronize data with another data directory that was forked from it

• Don’t need a full base backup, send only changed blocks

• Also don’t need to read through unchanged blocks

pg_rewind

Master Standby

New Master

Replication Replication

send only deltas

Master

MasterStandby

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

pg_rewind

Page 25: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

25Copyright©2017 NTT Corp. All Rights Reserved.

synchronous_commit =

[ off | local | remote_write | on | remote_apply ]

Reliability Control

data change

OK

OK

Time

Client

Master

Standby

waiting for standby...

write applyflush

OK OK

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 26: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

26Copyright©2017 NTT Corp. All Rights Reserved.

• “remote_apply”• it ’s guaranteed that a session committing a transaction on a master node

will be visible for session on the standby once it has been committed

• Read-balancing consistency

Reading Your Own Writes

OK

Time

Client

Master

Standbywrite & flush apply

OK

INSERT INTO tbl ...Read

previous writes

No result!

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

data change

synchronous_commit = on

Page 27: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

27Copyright©2017 NTT Corp. All Rights Reserved.

• “remote_apply”• it ’s guaranteed that a session committing a transaction on a master node

will be visible for session on the standby once it has been committed

• Read-balancing consistency

Reading Your Own Writes

OK

Time

Client

Master

Standbywrite & flush apply

OK

INSERT INTO tbl ... Read

Found!

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

data change

wait for data to be applied

synchronous_commit =

remote_apply

Page 28: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

28Copyright©2017 NTT Corp. All Rights Reserved.

• Measuring replication lags

• write_lag, flush_lag, apply_lag

Monitoring

SELECT application_name, write_lag, flush_lag, replay_lagFROM pg_stat_replication ;

application_name | write_lag | flush_lag | replay_lag------------------+-----------------+-----------------+-----------------node1 | 00:00:00.022447 | 00:00:00.029091 | 00:00:00.68424node2 | 00:00:00.003227 | 00:00:00.004059 | 00:00:20.148379node3 | 00:00:00.020398 | 00:00:00.023971 | 00:00:00.614862(3 rows)

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 29: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

29Copyright©2017 NTT Corp. All Rights Reserved.

• Enable to have more than one synchronous standbys

• Two methods• priority-based

• quorum-based

Multiple Synchronous Replication

Master

Standby (sync)

Standbys (async)

MasterStandbys (quorum)

Priority-based Quorum-based

Commit aftergot OK from

2 sync standbys

Commit aftergot OK from any 2 of 5 standbys

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 30: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

30Copyright©2017 NTT Corp. All Rights Reserved.

• Write-Ahead Log shipping

• wal sender process and wal receiver process

• Single master, multiple standbys

• Asynchronous replication

• Synchronous replication• priority-based and quorum-based

• Cascading replication

• Replication lag

Summary for Streaming Replication

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 31: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

31Copyright©2017 NTT Corp. All Rights Reserved.

LOGICAL REPLICATION HAS COME!

Page 32: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

32Copyright©2017 NTT Corp. All Rights Reserved.

• Row based

• Replicate a subset of a database

• Receive changes from multiple servers

• Replicate to a different major versions of PostgreSQL

• Initial data copy

• Publication / Subscription model

Logical Replication

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10

Page 33: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

33Copyright©2017 NTT Corp. All Rights Reserved.

• Partial replication (sending a subset of a database)

• Consolidating multiple database into a single one

• On-line major version upgrading

• Multi-master replication

Use cases

Page 34: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

34Copyright©2017 NTT Corp. All Rights Reserved.

• Publication is a set of changes generated from a table or group of tables

• Subscription defines set of publications to which it wants to subscribe

Publication / Subscription

Table

A

Table

B

Table

C

Table

D

Subscriber

Subscriber

pubA

pubBPublisher

Page 35: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

35Copyright©2017 NTT Corp. All Rights Reserved.

-- On Publisher

CREATE TABLE tbl (k int primary key, v int);

CREATE PUBLICATION tbl_pub FOR TABLE tbl;

INSERT INTO tbl VALUES (1), (2), (3);

Initial Setup

-- On SubscriberCREATE TABLE tbl (k int primary key, v int);CREATE SUBSCRIPTION tbl_sub CONNECTION ‘...’ PUBLICATION tbl_pub;SELECT * FROM tbl;c

---123

(3 rows)

Page 36: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

36Copyright©2017 NTT Corp. All Rights Reserved.

• An infrastructure feature of Logical Replication

• Logical Replication sends decoded WAL data (ROW-based)

Logical Decoding

BEGIN;

CREATE TABLE tbl (c int primary key);

INSERT INTO tbl VALUES (1), (2), (3);

COMMIT;

SELECT lsn, data FROM pg_logical_slot_get_changes('slot', pg_current_wal_lsn(), 10);

lsn | data------------+----------------------------------------1/331723E8 | BEGIN 2424221/3317A778 | table public.tbl: INSERT: c[integer]:11/3317A848 | table public.tbl: INSERT: c[integer]:21/3317A8C8 | table public.tbl: INSERT: c[integer]:31/3317ABC0 | COMMIT 242422

(5 rows)

Page 37: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

37Copyright©2017 NTT Corp. All Rights Reserved.

Basic Architecture in detail

backend

WAL WAL

backend wal sender apply worker

Table Table

1. Write

1. Modify

3. Read

4. Decode and Send

OK(LSNs)

5. Write

5. Apply

2. Notify

• After read WAL, wal sender process decodes it using a plugin (pgoutput)

• What to send by wal sender is, row-level changes

Page 38: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

38Copyright©2017 NTT Corp. All Rights Reserved.

Basic Architecture in MORE detail

WAL

INSERT

WriteRead

Decode

COMMIT

UPDATE

UPDATE

DELETE

INSERTDELETE

INSERT

INSERT

Reorder Buffer

COMMIT

Decode

change_cb

begin_cb

commit_cb

origin_cb

pgoutput

backendbackendbackend wal sender

apply worker

Logical Decoding

Replication Slot

slot_name = ‘slot’

plugin = ‘pgoutput’

restart_lsn = X/ABC000

:

Page 39: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

39Copyright©2017 NTT Corp. All Rights Reserved.

Streaming Replication and Logical Replication

wal sender

Replication Slot

synchronous_commit

Monitoring

Replication Lag

Synchronous Replication:

• Logical Replication has been developed since 9.4

• Many common components

Physical

ReplicationLogical

Replication

Page 40: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

40Copyright©2017 NTT Corp. All Rights Reserved.

• wal sender process

• Replication slot• keeping WAL segments, logical decoding plugin

• pg_stat_replication• Same monitoring interface

Common components

Page 41: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

41Copyright©2017 NTT Corp. All Rights Reserved.

• Decoded WAL receiver (apply worker)

• Management of apply workers (logical replication launcher)

• Logical replication protocol

• Snapshot builder

• Replication origin

• Reorder buffer

• Initial table synchronization

• Relation mapping

etc

Many New Components Are Required

Page 42: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

42Copyright©2017 NTT Corp. All Rights Reserved.

• Utility commands are not supported• DDLs, two-phase commit entries etc

• Not “streaming” logical replication• Decoded WAL is replicated when commit

• When using synchronous replication• can only use server-level (not subscription-level)

• wait for unsubscribed server

• “UPDATE OF” trigger

• Concurrency restriction• Concurrent CREATE SUBSCRIPTION

• Work around

• Concurrent ALTER SUBSCRIPTION REFRESH PUBLICATION

Known Restrictions

Page 43: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

43Copyright©2017 NTT Corp. All Rights Reserved.

• Ease restrictions• DDLs and utility command replication

• Streaming logical replication

• Online major version upgrading tool

• Confliction monitoring/management

• Built-in automatic fail-over

For Further Enhancement

Page 44: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

44Copyright©2017 NTT Corp. All Rights Reserved.

SUMMARY

Page 45: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

45Copyright©2017 NTT Corp. All Rights Reserved.

• Physical Replication is 10 years old• “Sea change” feature

• High functionality

• Matured

• Being continued to be evolved

• Logical Replication is 0 year old• Also “Sea change” feature

• Has enormous potentialities

• Some restrictions

• Common components with streaming replication

PostgreSQL Replication is AWESOME

Page 46: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

46Copyright©2017 NTT Corp. All Rights Reserved.

THANK YOU !!

Masahiko Sawadamail: [email protected]

Page 47: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

47Copyright©2017 NTT Corp. All Rights Reserved.

https://blog.2ndquadrant.com/bdr-history-and-future/

• Has been implementing since PostgreSQL 9.2

• Incremental development

Page 48: PostgreSQL Replication 2 - PGConf ASIA · Title: スライド 1 Author: chisato tomita Created Date: 12/4/2017 9:18:04 AM

48Copyright©2017 NTT Corp. All Rights Reserved.

• archive_mode = [ on | always | off ]

• Enable to use on-line backup using pg_start_backup() and pg_stop_backup()

Taking backups on standby server

9.0 9.1 9.2 9.3 9.4 9.5 9.6 10