Top Banner
MySQL Group Replication Kenny Gryp (@gryp) 1 / 65
65

MySQL Group Replication

Apr 13, 2017

Download

Technology

Kenny Gryp
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MySQL Group Replication

 

MySQL Group Replication      

         

Kenny Gryp (@gryp)

1 / 65

Page 2: MySQL Group Replication

Table of Contents

1. Overview 5. Backups

2. Provisioning Nodes 6. Load Balancers

3. Configuration 7. Improvements

4. Monitoring

2 / 65

Page 3: MySQL Group Replication

Apologies upfront. I was able to spend limited time researchingGroup Replication, my knowledge is a lot more limited comparedto many MySQL (Group Replication) developers in this room. Imay have made wrong assumptions or discuss problems, missing

features which are likely known and on the roadmap to befixed/developed in reasonable time. Group Replication is a quite

new feature and only recently became GA. Hereby I give fullresponsibility to the MySQL Developers to respond whenever I

made a wrong statement :-), but please keep in mind that my talk isonly 25 minutes long. Even though MySQL Group Replication is

marked GA, it is still a new feature and database software adoptionusually takes a long time, bugs mentioned (some of which are not

verified yet) in these slides are not here to try to tell you the featureis not good, I believe it is my duty as member of the community toprovide feedback and getting the opportunity to talk in this roomwith a lot of Oracle employees clearly demonstrates this is theirdesire as well, even though it does scare me quite a lot! This talk

describes the status of Group Replication on 31/01/2017.

3 / 65

Page 4: MySQL Group Replication

MySQL Group Replication

Overview

4 / 65

Page 5: MySQL Group Replication

MySQL Asynchronous Replication

5 / 65

Page 6: MySQL Group Replication

MySQL Group Replication

6 / 65

Page 7: MySQL Group Replication

Quick OverviewWrites in entire Group Replication executed in 'Global Total Order'Majority consensus (Paxos Mencius)

Writes will be received/accepted by majority of the nodesNo guarantee all nodes have received a trx beforeapplication gets OK back

Optimistic Locking: Conflict Detection after replicating trx:'Certification'

First Committer WinsEvery node has all data, cluster is 'as fast as slowest node'.Nodes can join/leave cluster

7 / 65

Page 8: MySQL Group Replication

PropertiesNo concept of master/slave, only 'members'Durability: No data loss, when failure of nodes happen. Does not acceptwrites if there is no Quorum.Active:Active Master: All nodes can be configured to accept writes at same time *No (time consuming) failover is necessary, every member canbecome a writer member at any timeAdded latency to every transaction COMMIT.

8 / 65

Page 9: MySQL Group Replication

MySQL InnoDB Cluster

9 / 65

Page 10: MySQL Group Replication

MySQL InnoDB Cluster

10 / 65

Page 11: MySQL Group Replication

MySQL Group ReplicationMain focus: Design & UsabilityPerformance & Stability was not yet analyzed

11 / 65

Page 12: MySQL Group Replication

Use Cases for Group ReplicationEnvironments with strict durability requirements (no data loss if master member is lost)Write to multiple nodes ('scalability' by splitting write/read workloads)Improve failover time...

12 / 65

Page 13: MySQL Group Replication

MySQL Group Replication

Provisioning Nodes

13 / 65

Page 14: MySQL Group Replication

GTIDPlease note that Group Replication uses GTID

14 / 65

Page 15: MySQL Group Replication

GTIDPlease note that Group Replication uses GTID

Keep into Account:

Creating a cluster and provisioning nodes requires 'compatible'GTID-setsErrant Transactions!

15 / 65

Page 16: MySQL Group Replication

Errant TransactionsEnsure there are no errant transactions before starting groupreplication:

[ERROR] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: 74dc6ab2-e1cc-11e6-92aa-08002789cd2e:1 > Group transactions: 72149827-e1cc-11e6-9daf-08002789cd2e:1, da7aba5e-dead-da7a-ba55-da7aba5e57ab:1-5'[ERROR] Plugin group_replication reported: 'The member contains transactions not present in the group. The member will now exit the group.'[Note] Plugin group_replication reported: 'To force this member into the group you can use the group_replication_allow_local_disjoint_gtids_join option'

16 / 65

Page 17: MySQL Group Replication

Best PracticePlease never usegroup_replication_allow_local_disjoint_gtids_join

once you use it, you always have to keep it on.they might have been writes to the individual node (GR notactive)

data consistency/split brain/data loss/...#84728: GR failure at start still starts MySQL#84733: not possible to start with super_read_only=1

17 / 65

Page 18: MySQL Group Replication

Starting ClusterFirst Choose the right node to bootstrap:

mysql> select @@global.gtid_executed\G ************************** 1. row ***************************@@global.gtid_executed: 72149827-e1cc-11e6-9daf-08002789cd2e:1, 740e1fd2-e1cc-11e6-a8ec-08002789cd2e:1-2, 74dc6ab2-e1cc-11e6-92aa-08002789cd2e:1-2, da7aba5e-dead-da7a-ba55-da7aba5e57ab:1-3991 row in set (0.00 sec)

ensure compatible GTID sets or forget about it!choose node with all GTIDs

18 / 65

Page 19: MySQL Group Replication

Cluster Membership OperationsStart a new cluster:

SET GLOBAL group_replication_bootstrap_group=on;START GROUP_REPLICATION;SET GLOBAL group_replication_bootstrap_group=off;

19 / 65

Page 20: MySQL Group Replication

Cluster Membership OperationsStart a new cluster:

SET GLOBAL group_replication_bootstrap_group=on;START GROUP_REPLICATION;SET GLOBAL group_replication_bootstrap_group=off;

Restore (GTID-enabled) Backup

SET GLOBAL group_replication_group_seeds='node1,node2,node3';START GROUP_REPLICATION;

#84674: unresolved hostnames block GR from starting

20 / 65

Page 21: MySQL Group Replication

MySQL Group Replication

Configuration

21 / 65

Page 22: MySQL Group Replication

Configuration Requirements[mysqld]log-binbinlog-format=rowbinlog-checksum=NONEgtid-mode=ONlog-slave-updatesmaster-info-repository=TABLErelay-log-info-repository=TABLEtransaction-write-set-extraction=XXHASH64

Group Replication Configuration:

group_replication_group_name="da7aba5e-dead-da7a-ba55-da7aba5e57ab"group_replication_local_address= "gr-2:24901"group_replication_group_seeds= "gr-1:24901,gr-2:24901,gr-3:24901"

22 / 65

Page 23: MySQL Group Replication

MySQL InnoDB ClusterNo Security/Authentication is described in these slidesPossible to create a cluster in the MySQL Shell w. AdminAPI

Also performs configuration checks

23 / 65

Page 24: MySQL Group Replication

Other Requirements/LimitationsRequired:

InnoDB RequiredPK on every table

24 / 65

Page 25: MySQL Group Replication

Other Requirements/LimitationsRequired:

InnoDB RequiredPK on every table

Not supported:

Transaction Savepoints

#84799: mysqldump --single-transaction uses savepoints, does not work with GR

In multi-writer/active:active

Concurrent DDL vs DML/DDL operations

25 / 65

Page 26: MySQL Group Replication

Oracle's (Valid) RecommendationsOnly usegroup_replication_single_primary_mode=ON

write to a single node onlyNot recommended for WAN (Probably because of Majority Consensus in Paxos Mencius)Requires uneven amount of nodes for proper Quorum

26 / 65

Page 27: MySQL Group Replication

My Recommendations - my.cnf

27 / 65

Page 28: MySQL Group Replication

My Recommendations - my.cnfDo not use loose- even though it is mentioned in the manual

#84631: installation documentation issues

28 / 65

Page 29: MySQL Group Replication

My Recommendations - my.cnfDo not use loose- even though it is mentioned in the manual

#84631: installation documentation issues._allow_local_disjoint_gtids_join=OFF

29 / 65

Page 30: MySQL Group Replication

My Recommendations - my.cnfDo not use loose- even though it is mentioned in the manual

#84631: installation documentation issues

._allow_local_disjoint_gtids_join=OFF

Single writer mode?group_replication_auto_increment_increment=7default is too high. Set to 1.

30 / 65

Page 31: MySQL Group Replication

My Recommendations - my.cnfDo not use loose- even though it is mentioned in the manual

#84631: installation documentation issues

._allow_local_disjoint_gtids_join=OFF

Single writer mode?group_replication_auto_increment_increment=7default is too high. Set to 1.

group_replication_bootstrap_group=OFF

31 / 65

Page 32: MySQL Group Replication

My Recommendations - my.cnfDo not use loose- even though it is mentioned in the manual

#84631: installation documentation issues

._allow_local_disjoint_gtids_join=OFF

Single writer mode?group_replication_auto_increment_increment=7default is too high. Set to 1.

group_replication_bootstrap_group=OFF

group_replication_start_on_boot=ON

#84728: GR failure at start still starts MySQL32 / 65

Page 33: MySQL Group Replication

My Recommendations

33 / 65

Page 34: MySQL Group Replication

My RecommendationsEnsure all FQDN hostnames are resolvable.

#84674: unresolved hostnames block GR from starting@@global.hostname is used by other members

34 / 65

Page 35: MySQL Group Replication

My RecommendationsEnsure all FQDN hostnames are resolvable.

#84674: unresolved hostnames block GR from starting@@global.hostname is used by other members

Dangerous to issue:

SET GLOBAL read_only=OFF;SET GLOBAL super_read_only=OFF;STOP GROUP_REPLICATION;

#84795: STOP GROUP_REPLICATION setssuper_read_only=off

35 / 65

Page 36: MySQL Group Replication

My Failed RecommendationIn an attempt to prevent split brain and because of:

#84728: GR failure at start still starts MySQLI tried to enforce super_read_only=1 at boot, but thatfailed too:

#84733: not possible to start with super_read_only=1

I did not find a way to prevent a MySQL node from starting as aindividual r/w MySQL server when Group Replication failed tostart.

36 / 65

Page 37: MySQL Group Replication

MySQL Group Replication

Monitoring

(Profiling, Trending, Alerting, Status, Troubleshooting)

37 / 65

Page 38: MySQL Group Replication

Performance SchemaSELECT TABLE_NAME FROM information_schema.TABLESWHERE TABLE_SCHEMA='performance_schema' AND TABLE_NAME LIKE '%replication%';+-------------------------------------------+| TABLE_NAME |+-------------------------------------------+| replication_applier_configuration || replication_applier_status || replication_applier_status_by_coordinator || replication_applier_status_by_worker || replication_connection_configuration || replication_connection_status || replication_group_member_stats || replication_group_members |+-------------------------------------------+

2 replication appliers:

group_replication_applier <- group replication

38 / 65

Page 39: MySQL Group Replication

Trending - SHOW GLOBALSTATUSLimited Status Information (which are usually easy to gather):

mysql> show global status like '%group%';+----------------------------------+--------------------------------------+| Variable_name | Value |+----------------------------------+--------------------------------------+| Com_group_replication_start | 0 || Com_group_replication_stop | 0 || group_replication_primary_member | 72149827-e1cc-11e6-9daf-08002789cd2e |+----------------------------------+--------------------------------------+

39 / 65

Page 40: MySQL Group Replication

Trending - PFSmysql> select * from replication_group_member_stats\G *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 14860449946972589:2 MEMBER_ID: 74dc6ab2-e1cc-11e6-92aa-08002789cd2e COUNT_TRANSACTIONS_IN_QUEUE: 0 # Certification queue COUNT_TRANSACTIONS_CHECKED: 4 COUNT_CONFLICTS_DETECTED: 0COUNT_TRANSACTIONS_ROWS_VALIDATING: 0TRANSACTIONS_COMMITTED_ALL_MEMBERS: 72149827-e1cc-11e6-9daf-08002789cd2e:1, 740e1fd2-e1cc-11e6-a8ec-08002789cd2e:1-2, 74dc6ab2-e1cc-11e6-92aa-08002789cd2e:1-2, da7aba5e-dead-da7a-ba55-da7aba5e57ab:1-444:1000041-1000503:2000041- LAST_CONFLICT_FREE_TRANSACTION: da7aba5e-dead-da7a-ba55-da7aba5e57ab:4441 row in set (0.00 sec)

40 / 65

Page 41: MySQL Group Replication

Group Replication Statusmysql> select * from replication_connection_status\G *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier GROUP_NAME: da7aba5e-dead-da7a-ba55-da7aba5e57ab SOURCE_UUID: da7aba5e-dead-da7a-ba55-da7aba5e57ab THREAD_ID: NULL SERVICE_STATE: ONCOUNT_RECEIVED_HEARTBEATS: 0 LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00 RECEIVED_TRANSACTION_SET: 72149827-e1cc-11e6-9daf-08002789cd2e:1, 740e1fd2-e1cc-11e6-a8ec-08002789cd2e:1-2, 74dc6ab2-e1cc-11e6-92aa-08002789cd2e:1-2, da7aba5e-dead-da7a-ba55-da7aba5e57ab:1-444:1000041-1000503:2000041-2000648 LAST_ERROR_NUMBER: 0 LAST_ERROR_MESSAGE: LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00

41 / 65

Page 42: MySQL Group Replication

Group Replication Membersmysql> select * from replication_group_members; +--------------------------------------+------+------+--------+| ID | HOST | PORT | STATE |+--------------------------------------+------+------+--------+| 72149827-e1cc-11e6-9daf-08002789cd2e | gr-1 | 3306 | ONLINE || 74dc6ab2-e1cc-11e6-92aa-08002789cd2e | gr-3 | 3306 | ONLINE |+--------------------------------------+------+------+--------+2 rows in set (0.00 sec)# slightly modified output

#84796: GR Member status is wrong

42 / 65

Page 43: MySQL Group Replication

Group Replication Read or Write.mysql> select @@global.super_read_only;+--------------------------+| @@global.super_read_only |+--------------------------+| 1 |+--------------------------+1 row in set (0.05 sec)

43 / 65

Page 44: MySQL Group Replication

Group Replication LagSELECT sys.gtid_count( GTID_SUBTRACT( ( SELECT Received_transaction_set FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier' ), (SELECT @@global.GTID_EXECUTED) ) ) )

Thanks to @lefred:https://github.com/lefred/mysql_gr_routing_check/

44 / 65

Page 45: MySQL Group Replication

Commandsmysql> SHOW SLAVE STATUS FOR CHANNEL 'group_replication_recovery'\G

mysql> SHOW SLAVE STATUS FOR CHANNEL 'group_replication_applier'\GERROR 3139 (HY000): SHOW SLAVE STATUS cannot be performed on channel 'group_replication_applier'

45 / 65

Page 46: MySQL Group Replication

Member State ChangesCan use improvements:

#84796: GR Member status is wrong#84798: Group Replication can use some verbosity in the errorlog

46 / 65

Page 47: MySQL Group Replication

Multi Node ConflictsOptimistic LockingFirst committer wins#84730: ability to troubleshoot transaction rollbacks

ERROR 3101 (HY000) at line 1: Plugin instructed the server to rollback the current transaction.

47 / 65

Page 48: MySQL Group Replication

MySQL Group Replication

Backups

48 / 65

Page 49: MySQL Group Replication

BackupsIt's just InnoDB, use your favorite GTID supported backup tool:

Percona XtraBackupMySQL Enterprise Backupmysqldumpmysqlpumpmydumper

but...

49 / 65

Page 50: MySQL Group Replication

BackupsIt's just InnoDB, use your favorite GTID supported backup tool:

Percona XtraBackupMySQL Enterprise Backupmysqldumpmysqlpumpmydumper

but...

#84799: mysqldump --single-transaction uses savepoints, doesnot work with GRmydumper --use-savepoints is also affected

50 / 65

Page 51: MySQL Group Replication

MySQL Group Replication

Load Balancers

51 / 65

Page 52: MySQL Group Replication

MySQL Router (Beta)MySQL Router is part of MySQL InnoDB Cluster

52 / 65

Page 53: MySQL Group Replication

MySQL Router (Beta)MySQL Router is part of MySQL InnoDB Cluster

I briefly evaluated, but quickly ran into serious problems:

missing a lot of featurespretty unknown in the community...:

#83237: mysqlrouter connects to wrong metadata server (apartitioned node)

Visibility:

#83236: How to see mysqlrouter membership status?

53 / 65

Page 54: MySQL Group Replication

MySQL Router (Beta)MySQL Router is part of MySQL InnoDB Cluster

I briefly evaluated, but quickly ran into serious problems:

missing a lot of featurespretty unknown in the community...:

#83237: mysqlrouter connects to wrong metadata server (apartitioned node)

Visibility:

#83236: How to see mysqlrouter membership status?

I might consider to re-evaluate, but first ^^54 / 65

Page 55: MySQL Group Replication

ProxySQLReally Open Source!Becoming very popularExample Implementation http://lefred.be/content/ha-with-mysql-group-replication-and-proxysql/

Careful: #2: using multiple hostgroups/schedulers withproxysql_groupreplication_checker.sh cancause unwanted state changes

55 / 65

Page 56: MySQL Group Replication

MySQL Group Replication

Improvements

56 / 65

Page 57: MySQL Group Replication

#84784 - Nodes Do Not ReconnectNodes do not reconnect to the group replication once they gotdisconnected, causing nodes to drop from the cluster and canlead to losing the whole cluster availability

57 / 65

Page 58: MySQL Group Replication

Improvements

58 / 65

Page 59: MySQL Group Replication

ImprovementsReduce impact on applications:

#84731: mysql client connections get stuck during GR start

59 / 65

Page 60: MySQL Group Replication

ImprovementsReduce impact on applications:

#84731: mysql client connections get stuck during GR startPartition Tolerance issues, split brain cannot be prevented:

#84727: partitioned nodes still accept writes: queries hang#84728: GR failure at start still starts MySQL#84729: block reads on partitioned nodes *#84733: not possible to start with super_read_only=1#84784: Nodes Do Not Reconnect#84795: STOP GROUP_REPLICATION setssuper_read_only=off

60 / 65

Page 61: MySQL Group Replication

ImprovementsStability:

#84785: Prevent Large Transactions in Group Replication#84792: Member using 100% CPU in idle cluster#84796: GR Member status is wrong

61 / 65

Page 62: MySQL Group Replication

ImprovementsStability:

#84785: Prevent Large Transactions in Group Replication#84792: Member using 100% CPU in idle cluster#84796: GR Member status is wrong

Usability:

#84674: unresolved hostnames block GR from starting#84794: cannot kill query that is stuck inside GR#84799: mysqldump --single-transaction uses savepoints, does not work with GR#84798: Group Replication can use some verbosity in the error log

62 / 65

Page 63: MySQL Group Replication

For Every Bug Fixed: 1 Beer

63 / 65

Page 64: MySQL Group Replication

MySQL Group Replication

Summary

64 / 65

Page 65: MySQL Group Replication

SummaryUse Cases for Group Replication

Environments with strict durability requirements

Ensure split-brain can be completely avoidedWrite to multiple nodes ('scalability' by splitting write/read workloads)

Not recommended (yet)Improve failover time

Reducing impact on applications can be improved...

65 / 65