MYSQL Patterns in Amazon - Make the Cloud Work For You

Jay Edwards & Ben Black

PalominoDB

{jay, ben}@palominodb.com

MySQL in AWS

Patterns

Agenda

1. Introduction

2. RDS, EC2/MySQL

3. Web console, CLI, API

4. Performance/Availability

5. Implementation choices

6. Managing DDL

7. Common failures

8. Cost

9. Questions

About us

Jay!

CTO, PDB, OFA, Twitter

Ben!

Sr. DBA, PDB, Garmin

Booth? Yes. Hiring? Yes.

Interactivity

Ask away; we've got time. Ben will be glad to

try and solve your problems.

AWS tutorial?

• "Click on the replica button and come back

in 30 minutes"

• "PIOPs <-> EBS. Uncheck that box and

come back in 2 hours"

RDS and EC2/MySQL

RDS benefits

Fully managed

• High Availability

• Replicas? *click*

• PIT recover? *click*

• *click, click, click*

RDS un-benefits

Fully managed

• No binlog access

• No SUPER

• No flexible topology

The more experienced a DBA you are, the

crankier you will be.

RDS improves!

Like all AWS properties, RDS features continue

to improve all the time.

It's perfect for developers, proofs of concept,

one-offs, absorbing temporary load.

(Tungsten supports replication into RDS from

MySQL).

EC2/MySQL

All the MySQL you've come to love & hate

Multi-Region via replication & WAN tunnel

Why RDS or EC2?

RDS

1. You can tolerate ~99% uptime (which many

people can)

2. You don't have lots of DBAs and need to

optimize for operational ease

EC2

1. Multi-region availability

2. Vertical scaling

Questions?

Any particular scenarios you want to ask us

about?

Web Console, CLI, API

Overview

Functionality isn't complete

• some things aren't exposed via some

methods

Web Console

Most of the stuff you need for common day-to-

day maintenance

Sometimes:

• slow

• isn't working

• needs rage-clicking

CLI setup

RDS CLI

export AWS_RDS_HOME

export AWS_CREDENTIAL_FILE

(AWSAccessKeyId,AWSSecretKey)

http://docs.aws.amazon.com/AmazonRDS/latest/CommandLineReference/Welcome.html

http://docs.aws.amazon.com/AmazonRDS/latest/CommandLineReference/Welcome.html

CLI pain

It's written in Java right now*. The JVM

overhead makes it painfully slow for large-

scale automation.

* The future is the Redshift CLI (python,

coherent interface)

CLI output

Verbose and clunky

DBINSTANCE,scp01-replica2,2010-05-

22T01:53:47.372Z,db.m1.large,mysql,50,(nil),master,available,scp01-

replica2.cdysvrlmdirl.us-east-1.rds.amazonaws.com,3306,us-east-

1b,(nil),0,(nil),(nil),(nil),(nil),(nil),(nil),(nil),(nil),sun:05:00-sun:09:00,23:00-

01:00,(nil),n,5.1.50,n,simcoprod01,general-public-license

SECGROUP,Name,Status

SECGROUP,default,active

PARAMGRP,Group Name,Apply Status

PARAMGRP,default.mysql5.1,in-synch

Combining the worst features of machine- and human-readable text

formats.

API

Use Boto! (Mitch works for AWS).

Apply immediately.

--apply-immediately

Check the box hiding at the bottom of the page.

Availability

How many nines?

EC2 Region SLA

99.95% SLA

“Annual Uptime Percentage” is calculated by subtracting from 100% the percentage of 5 minute periods during the Service Year in which Amazon EC2 was in the state of “Region Unavailable.”

("Region unavailable" == "multiple AZs are toast")

Implies you've got to go multi-region

EC2 Region SLA

~99.2% Reality

The previous definition is very strict; 2 or more

regions; can't create instances; blah, blah.

1-2X year multi-AZ degradation (EBS, network,

who knows)

Multi-region

It's coming for RDS. Probably before the end of

the year.

Until then...

Always go Multi-AZ

Minimal downtime for most maintenance

Saves you from most master crashes

{Sometimes, often, frequently} destroys all

replicas

Multi-AZ binlogs

sync_binlog=1

innodb_support_xa=1

• used to DESTROY write throughput

• MySQL 5.6 drastic improvements

Questions?

Questions about designing for availability?

Implementation Choices

Instance sizing

Dynamicity == reduced cost

(Now, in general, $$ isn't why you go to the

cloud; it's operational efficiency & reduced

friction).

Have a spreadsheet and do capacity analyses

frequently.

Ephemeral SSDs

Really nice! 150,000* IOPs

Really bad! ~~POOF~~

• Excellent for replicas

• Requires operational excellence

* YMMV

Provisioned IOPs

Really, really nice!

• Drastically lower failure rate (order of

magnitude)

• Guaranteed throughput

Not so nice.

• Costs $$

Provisioned IOPs

Masters and replicas can be different.

You can convert PIOPs <-> EBS back and

forth.

Consider multi-AZ PIOPs master for the best in

durability.

VPCs

Go VPC from the beginning for production.

• Hard to convert

• Use ELBs for internal load-balancing

• Not sharing the 10.net with everybody

Cluster compute

Placement groups are available for CC

instances.

"Placement group" means "physically close

hardware".

Very low-latency 10GbE full bisection

Questions?

Questions about your particular setup?

Managing DDL

DDL

Not possible to perform ddl on a slave, then

swap with master.

Slave promotion

Blocking DDL

DDL

Online schema changes

(log_bin_trust_function_creators)

No OS access

Be careful cleaning up if you ctrl-c

CALL mysql.rds_skip_repl_error;

Questions?

Questions about DDL?

Escape from RDS

mysql schema

--routines

users?

Dumping Users

mysql --host=olddatabasehost -BNe "select

concat('\'',user,'\'@\'',host,'\'') from mysql.user where user

not like 'rds%' and user != 'master'" | \

while read uh; do mysql --host=olddatabasehost -BNe

"show grants for $uh" | sed 's/$/;/; s/\\\\/\\/g'; done >

user_grants.sql

http://www.villescorner.com/2012/11/mysqldump-from-

amazon-rds-headaches-of.html

Common failures

(Should really be called zones and regions)

Operations is about

managing change and

mitigating risk.

Local failures

Database crashes

Human error

Localized EBS hang

How to mitigate?

Multi-AZ PIOPs master

Operational excellence

Throw away & rebuild replicas

Local failures redux

Local failures should be, at most, annoyances.

Runbooks*

Game days

Monitoring

* Process is a poor substitute for competence.

If you can't deal with

expected and desired

change, you'll never be

able to handle unexpected

and unwanted change.

Regional failures

A well-designed architecture will save you.

How quickly can your DNS flip?

How good is your replication?

Do you have a CDN?

Is your application going to run?

Not everybody can afford this.

Zones and Regions

A zone is analogous to a data center (for some

small number of buildings).

A region is a geographically dispersed

collection of zones that is distinct from any

other region.

Zones & Regions differ

Different instance types

Different features

Different provisioning capacity

OFA had ~40% of the US-East medium

instances at one point. Couldn't duplicate that

in US-West

Questions?

Cost

Reserved instances

• Substantial savings (how often do you turn

off production databases?)

• Secondary market

• Must match AZ and instance size

• Discount coupon

Heavy utilization instances charge the

hourly rate 24x7

Watch the $

Spreadsheet!

Inventory!

Load analysis!

Cloudability!

Dynamicity

The only thing you can't do is downsize

storage.

Change instance size? Check.

Turn PIOPs off? Check.

Delete replicas? Check.

Up to meet need. Down to meet budget.

Upgrading

Minor upgrade (can be auto during maint

window / will reboot or failover)

*Disable this

Upgrade from 5.5 to 5.6

1) Dump/load

2) Delta load

3) Switchover

Fin

Ask away!

MYSQL Patterns in Amazon - Make the Cloud Work For You

Technology

MYSQL Patterns in Amazon - Make the Cloud Work For You