Page 1
2012 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
TechEventApril 2013
Galera ClusterSynchronous Multi-Master Replication for MySQL HA
Ludovico CALDARA
LS-IMS
27.04.2013
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
1
Page 2
2012 © Trivadis
MySQL forks: which one is better?
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
2
� New forks
� Many new features
� Improved instrumentation
� New solutions for DEVs and DBAs
� Fast-paced competition between forks’ developers
� Recent evolutions in HA and scalability have made MySQL enterprise ready
MySQL Oracle MySQL
Percona Server
MariaDB
Drizzle
Page 3
2012 © Trivadis
There is no recipe that can satisfy all tastes
Percona Server MariaDB MySQL
Multi source replication NO YES (rel. 10) NO
NoSQL integration YES (cassandra) YES (cassandra) YES (memcached)
Virtual Columns NO YES NO
Improved diagnostics YES NO NO
Online DDL NO YES YES
Galera Cluster YES YES YES (codership patch)
Many many others YES/NO YES/NO YES/NO
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
3
Page 4
2012 © Trivadis
Your real requirements will let you choose… Need HA?
• How will react your customer if there is an important loss of service?
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
4
Page 5
2012 © Trivadis
MySQL Multi-Master Replication• Complex and not reliable• Concurrent writes lead to logical corruption
Native MySQL Replication• Doesn’t scale writes• Complex to promote slaves
Old-school solutions have weaknesses
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
5
DRBD Replication• Standby is offline, doesn’t scale at all• Poor performance
NDB
NDB
NDB
MySQL Cluster• Very complex• It’s not InnoDB!
Page 6
2012 © Trivadis
Continuent Tungsten Replicator• Similar to Golden Gate• Heterogeneous databases• Provides complex topologies• Asynchronous• Conflicts are complex to resolve• Complex to maintain• Not free
New school solutions: 3rd parties are playing a decisive role
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
6
ORACLE
MYSQL
MYSQL
ORACLEMYSQL
Galera Cluster Replication• Transparent Multi-Master easy to mantain• (Virtually) Synchronous• It’s InnoDB (only InnoDB)• Great and easy scalability• Optimistic locking (side effects)• At least 3 nodes for good HA
Page 7
2012 © Trivadis
Multi-Master and virtually synchronous: it’s transparent
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
7
R/W R/W R/W R/W R/W
Page 8
2012 © Trivadis
Cluster implementation - Ingredients
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
8
• One or more standalone servers (either physical or virtual)
• Linux (other operating systems are not yet available)
• “Permissive” Firewall between nodes
• Codership’s Galera Library package
• A package of your choice:
• Percona XtraDB Cluster
• MariaDB Galera Cluster
• MySQL with wsrep patch (patched by Codership)
Page 9
2012 © Trivadis
Cluster implementation - Variables
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
9
• Each server’s my.cnf must contain:
• wsrep_cluster_address=gcomm://192.168.1.100,…,192.168.1.10x
• wsrep_provider=/usr/lib64/libgalera_smm.so
• binlog_format=ROW
• default_storage_engine=InnoDB
• innodb_autoinc_lock_mode=2
• innodb_locks_unsafe_for_binlog=1 #disables gap locking
Page 10
2012 © Trivadis
Cluster implementation – Start the cluster
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
10
mysqld_safe --wsrep_cluster_address=gcomm:// &
[…]130220 17:56:46 [Note] WSREP: Starting new group from scratch : […]
The empty gcomm:// address starts the node as the first of the cluster
NEVER USE IT TO JOIN AN EXISTING CLUSTER
Page 11
2012 © Trivadis
Cluster implementation – Adding nodes to the cluster
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
11
mysqld_safe \
--wsrep_cluster_address=gcomm://host1,host2… &
[…]130220 18:01:56 [Note] WSREP: Shifting OPEN -> PRIMARY (TO:…)130220 18:01:56 [Note] WSREP: State transfer required: […]
The address should be already present in the my.cnf!
Page 12
2012 © Trivadis
Server State Transfer
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
12
• The joiner asks for a SST
R/W R/W R/W
Page 13
2012 © Trivadis
Server State Transfer
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
13
• The joiner asks for a SST
R/W R/W
• The cluster chooses a donor, the donor is taken offline
DONOR
Page 14
2012 © Trivadis
Server State Transfer
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
14
• The joiner asks for a SST
R/W R/W R/W
• The cluster chooses a donor, the donor is taken offline
DONOR
• The donor is backed up
• The donor comes online again and the joiner is loaded
Page 15
2012 © Trivadis
Server State Transfer
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
15
• The joiner asks for a SST
R/W R/W R/W
• The cluster chooses a donor, the donor is taken offline
DONOR
• The donor is backed up
• The donor comes online again and the joiner is loaded
• The joiner replays the missing transactions and joins the cluster
R/W
Page 16
2012 © Trivadis
Server State Transfer
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
16
• The joiner asks for a SST
R/W R/W R/W
• The cluster chooses a donor, the donor is taken offline
• The donor is backed up
• The donor comes online again and the joiner is loaded
• The joiner replays the missing transactions and joins the cluster
R/W• The cluster can also do
Incremental State Transfers (IST)
Page 17
2012 © Trivadis
Split-Brain
• The majority of nodes wins
• Complete loss of network: all nodes go offline
• The offline nodes will respond:mysql> select * from emp;
ERROR 1047 (08S01): Unknown command
• Galera arbitrator (garbd) can join the cluster and count as a member in split brain resolution.
• NEW: Galera 2.4 intruduces weighted quorum
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
17
gararbitrator
Page 18
2012 © Trivadis
WAN
Example 1: Arbitrator in Trivadis Swiss
… sorry for German/Austrian attenders ☺
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
18
arbitrator
ZURICH
BASEL
BERN
LAUSANNE
• If the WAN connection is lost, Zurich survives
• If the Zurich site is lost, the cluster will be off lined
Page 19
2012 © Trivadis
WAN
Example 2: Arbitrator in Trivadis Swiss
… sorry for German/Austrian attenders ☺
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
19
arbitrator
ZURICH
BASEL
BERN
LAUSANNE
• If the Zurich site is lost, the other sites survive
• If the WAN connection is lost, the cluster will be off lined
Page 20
2012 © Trivadis
What does “Virtually synchronous” mean? In brief:
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
20
Write
Page 21
2012 © Trivadis
What does “Virtually synchronous” mean? In brief:
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
21
Write
Commit
WS
Page 22
2012 © Trivadis
What does “Virtually synchronous” mean? In brief:
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
22
Write
Commit
WS WS WS
Page 23
2012 © Trivadis
What does “Virtually synchronous” mean? In brief:
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
23
Write
Commit
WS WS
Commit OK
Page 24
2012 © Trivadis
What does “Virtually synchronous” mean? In brief:
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
24
Write
Commit
WS WS
Commit OK
• Writes are as fast as if they were local
• Commits take just the time of a network roundtrip: if acceptable then the cluster can be spread geographically
Page 25
2012 © Trivadis
Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara' ;Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
25
Page 26
2012 © Trivadis
Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara' ;Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
26
mysql> update emp set salary=‘one billion' where name=‘Cald ara';Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0
Page 27
2012 © Trivadis
Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara' ;Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27
mysql> update emp set salary=‘one billion' where name=‘Cald ara';Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0mysql> commit ;
WS
Query OK , 0 rows affected (0.01 sec
WS
Page 28
2012 © Trivadis
Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara' ;Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
28
mysql> update emp set salary=‘one billion' where name=‘Cald ara';Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0mysql> commit ;
mysql> commit ;ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Query OK , 0 rows affected (0.01 sec
WS
mysql> select salary from emp where name=‘Caldara’;+-------------+| salary |+-------------+| one billion |+-------------+
Page 29
2012 © Trivadis
Conclusions on optimistic locking…
• Cluster-wise, the first that broadcasts its commit wins (it’s Galera…)
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
29
• The application should not have hotspots...
• … or it should retry the transaction after the deadlock occurs…
• … or, for each database, you can elegy one node as the master
• Locally, the first that acquires the lock wins (it’s InnoDB…)
Page 30
2012 © Trivadis
About performance
• Commit performance loss is between 5% and 10% plus the network RTT
• Write workloads scale to up to 8 nodes
• >8 nodes: it scales reads, not writes
• Many benchmarks show that Galera overcomes NDB with few nodes
• NDB scales out more with many nodes thanks to data sharding
• Benchmarks on internet are not always reliable… test the performance of YOUR application
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
30
Page 31
2012 © Trivadis
How to migrate
• Converts all your tables to InnoDB
• Double-check that all tables have primary keys
• Think about potential problems caused by triggers (if you have any)
• Create a new empty Galera Cluster
• Setup MySQL native replication between the old database and the Galera cluster
• Once all is aligned, direct your clients on the new cluster
• Setup the old node to join the cluster
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
31
NATIVEREPLICATION
JOIN
Page 32
2012 © Trivadis
Load balancing
• HAProxy is the most used solution so far
• Codership is actively developing his own load balancer: Galera Load Balancer (glbd)
• Several balancing modes: round robin, custom, least connected, …
• Automatically drains disconnected nodes
• New nodes can be added with a single tcpcall
• Release 1.0 (now rc1) will support watchdog and automatic discover of nodes composing the cluster
• Other methods possible (e.g. java connector properties, HW load balancer)
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
32
Page 33
2012 © Trivadis
Conclusions on Galera Cluster
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
33
•
•
•
•
•
•
•
• At least 3 nodes to have good HA
• Optimistic locking (side effects)
• Explicit locking doesn’t work
• Only InnoDB is replicated
• Primary keys are mandatory
• Not yet available for MySQL 5.6
• Linux only
• Multi-master
•
•
•
•
•
•
• Easy to add/remove nodes
• Multi-master
• Shared-nothing
• Great performances and scalability
• «Virtually» synchronous
• It uses InnoDB!!
• Conflict prevention
• Split-brain (no inconsistencies)
• Easy to add/remove nodes
Page 34
2012 © Trivadis
Links
http://www.slideshare.net/skysql/galera-cluster-by-seppo-jaakola-codership-at-skysql-roadshow-in-stuttgart-2013
http://www.codership.com/files/presentations/Galera_Replication_PLL_2011.pdf
http://www.mysqlperformanceblog.com/2013/01/31/feature-in-details-incremental-state-transfer-after-a-node-crash-in-percona-xtradb-cluster/
http://www.percona.tv/percona-webinars/migrating-to-percona-xtradb-cluster
http://www.codership.com/content/5-tips-migrating-your-mysql-server-galera-cluster
http://www.mysqlperformanceblog.com/2012/08/17/percona-xtradb-cluster-multi-node-writing-and-unexpected-deadlocks/
http://www.mysqlperformanceblog.com/2012/11/20/understanding-multi-node-writing-conflict-metrics-in-percona-xtradb-cluster-and-galera/
http://www.mysqlperformanceblog.com/2011/10/13/benchmarking-galera-replication-overhead/
http://karlssonondatabases.blogspot.ch/2012/12/galera-features-beyond-just-ha.html
http://infoscience.epfl.ch/record/52305/files/IC_TECH_REPORT_199908.pdf
http://www.inf.usi.ch/faculty/pedone/Paper/2005/2005WDIDDR.pdf
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
34
Page 35
2012 © Trivadis
Little demo?
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
35
Page 36
2012 © Trivadis
?
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
36
Page 37
2012 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
THANK YOU.Trivadis SA
Ludovico CaldaraSenior Consultant
[email protected]
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
37