MySQL High Availability and Disaster Recovery with Continuent, a VMware company

© 2015 VMware Inc. All rights reserved.

MySQL High Availability and Disaster Recovery Featuring Continuent

Robert Hodges January 2015

Continuent Quick Introduction

2

History Products 2004 Continuent established in USA

2009 3rd Generation Continuent Tungsten (aka VMware Continuent) ships

2014 100+ customers running business-critical applications

Oct 2014 Acquisition by VMware: Now part of the vCloud Air Business Unit

Oct 2015 Continuent solutions available through VMware sales

Industry-leading clustering and replication for open source DBMS

Clustering – Commercial-grade HA, performance scaling, and data management for MySQL

Replication– Flexible, high-performance data movement

Business-Critical Deployment Examples

High Availability for MySQL

Largest cluster deployment performs 800M+ transactions/day on 275 TB of relational data

Business Continuity Cross-site cluster topologies widely deployed including primary/DR and multi-master

High Performance Replication

Largest installations transfer billions of transactions daily using high speed, parallel replication

Heterogeneous Integration

Customers replicate from MySQL to Oracle, Hadoop, Redshift, Vertica, and others

Real-time Analytics Optimized data loading for data warehouses with deployments of up to 200 MySQL masters feeding to Hadoop

Continuent Facts

3

Select Continuent Customers

4

Too Good To Be True

The Dream: multiple, active DBMS servers with identical data over distance

High Availability

Updates propagated immediately to all servers

Transparent read/write to

any server

High Performance

Synchronous multi-master clusters claim to deliver on the dream

Table fooid=1, data=6

Ordering



[1] id=1, data=6 [2] id=1, data=5 [3] id=7, data=25

Synchronous multi-master introduces new problems


Ordering


REJECTED!


[1] id=1, data=6 [2] id=1, data=5 [3] id=7, data=25

…That grow as data scale in volume and distance

• Transaction failures due to conflicts • Operations like SELECT FOR UPDATE not supported • Slow writes due to synchronous messaging • Large transactions lock system or cause failures • Cross-site replication is unstable

Can master/slave clusters offer the same benefits?

High Availability?

Updates propagated

immediately?

Transparent read/write to any server?

High Performance?

Continuent Master/Slave Clusters

24x7 data access

SQL load balancing

Simple management

Off-the-shelf MySQL

Continuent Clustering: HA, DR and Performance Scaling

db2 db1 db3

Slave Master Slave

Application Stack

Continuent Connector

Application Stack


Benefits

Manager

Replicator

Manager

Replicator

Manager

Replicator

Continuent clusters add HA and scaling without taking features away

13

Slave Master Slave

Continuent Connector Continuent Connector

Continuent Connector operates as an intelligent proxy to the DBMS

•  Any MySQL client can connect •  Connector initiates connections on behalf of client to the DBMS

mySQL

Master

mySQL

Slave

mySQL

Slave

Application Connector

MySQL ProtocolCOM_QUERYCOM_INIT_DB

COM_DROP_DB…

Connector minimizes overhead from proxying

•  Pass-through operation after connection •  Full transparency and low overhead for clients

mySQL

Master

mySQL

Slave

mySQL

Slave

Application Connector

(Packet)COM_QUERY

SELECT * FROM foo

(Packet)OK

ResultSet Rows: 1

Continuent SmartScale provides session load balancing

•  Initial write goes to master •  Reads go to replicas if it is safe to do so.

mySQL

Master

mySQL

Slave

mySQL

Slave

Application Connector(Session “X” Binlog Position)

Initial Write

Connect/Insert data

Write committed

Not received

Not received


•  Auto-commit reads are eligible to go to slave •  Reads stay on master until a slave catches up

mySQL

Master

mySQL

Slave

mySQL

Slave


Select Data

Write committed

Not received

Received but not applied

Read from master

NO read from slave


•  Reads go to slave when it has caught up with master •  Session tags may be schema name or supplied by application

mySQL

Master

mySQL

Slave

mySQL

Slave


Select Data

Write committed

Received but not applied

Received and

appliedRead from slave

Manager

Replicator

Manager

Replicator

Manager

Replicator

Connectors can be configured to support different levels of service

19

Slave Master Slave


(SmartScale) (Strict Consistency)

Demo: Transparent connectivity to replicas

Failover and Maintenance

Continuent clusters automatically monitor all cluster nodes for failure


Master

SlaveSlave

Cluster rules fail over master if DBMS no longer accepts network connections


Master

SlaveSlave

X1. Detect non-responsive node

2. Halt in-coming connections

3. Find and promote most up-to-date slave

Failed nodes can be reprovisioned from a backup with a single management command


New Master

Shunned nodeSlave

4. Administrator inspects and recovers old master X

Continuent clusters support zero-downtime maintenance operations from parameter changes to app upgrade

• Task: change the InnoDB log file size • Problem: requires a mysqld restart, hence can cause

application downtime • Constraint: avoid application-visible restart • Solution: upgrade nodes in succession

Rolling maintenance proceeds node-by-node starting with slaves and proceeding to master

Slave upgrade

Slave upgrade Switch Master

upgrade

•  Shun slave •  Resize

journal, restart mysqld

•  Return node to cluster

•  Discard and reprovision on failure

•  Repeat for remaining slave(s)

•  Switch master to promote an upgraded slave

•  Upgrade old master

•  Maintenance is now done!

Transaction Scaling with Master/Slave Topologies

Size and transaction activity on business data depend on many factors

28

0

200

400

600

800

1000

1200

1400

1 501 Dat

aset

Siz

e in

Gig

iaby

es

Customers

SaaS Datasets -- Size of Top 1000 Customers

99th percentile=290GB

Max=1214GB

Median=2.6GB

Source: Statistics provided by Continuent customer

Manager

Replicator

Manager

Replicator

Manager

Replicator

DBMS workloads are correspondingly varied

29

Complex queries

Large batch operations

Small online transactions

Analytic reports

Slave Master Slave

Asynchronous replication decouples transaction processing on master and slave DBMS nodes

30

Replicator

mySQL

DBMS Logs

mySQL

Replicator

THL

THL

Download transactions via

network

Apply using JDBC(Transactions + metadata)

(Transactions + metadata)

Master

Slave

Parallel apply maximizes DBMS I/O bandwidth when updating replicas

31

Master replicator

THL

Parallel queue(Transactions + metadata)

Slave

Extract

Filter Apply Extrac

t Filte

r Apply

Extract

Filter Apply

Extract

Filter Apply

Extract

Filter Apply

StageStageStage

Slave Replicator Pipeline

Demo: Scalable transaction processing

Distributing Data between Regions and to Other DBMS types

Continuent Disaster Recovery creates composite clusters that span sites and are ready for immediate failover

SJC Master Service NYC Slave Service

Slave Slave

Master

Slave Slave

RelayCross-Region Replication

(Async master/slave)


Continuent multi-master, cross-site cluster operate independent, active clusters on 2 or more remote sites

SJC Service NYC Service

Slave Slave

Master

Slave Slave

MasterCross-Region Replication

(Async Multi-master)


The same replication mechanism supports real-time loading of data warehouses

SJC Service Hadoop Cluster

Slave Slave

Master


Wrap-Up

Master/slave clustering is a robust technology for enterprise data management!

Very High Availability

Updates propagated

without cost to applications

Transparent connectivity with full SQL

semantics

Very High Performance

Continuent offers…

• Highly available clusters of off-the-shelf MySQL servers • Zero-downtime maintenance and upgrade • High performance regardless of data volume or distance • Replication over regions to DR sites as well as non-

MySQL data warehouses

For more information, contact us: Robert Noyes Alliance Manager, USA & Canada [email protected] +1 (650) 575-0958 Philippe Bernard Alliance Manager, EMEA & APAC [email protected] +41 79 347 1385

Eero Teerikorpi Sr. Director, Strategic Alliances [email protected] +1 (408) 431-3305

MySQL High Availability and Disaster Recovery with Continuent, a VMware company

Technology

data management

identical data

data warehouses

data scale

continuent customers

hadoop continuent facts

continuent solutions

continuent robert hodges