Deep Dive on MySQL Databases on Amazon RDS Dive on MySQL Databases...Deep Dive on MySQL Databases on Amazon RDS Chayan Biswas Sr. Product Manager, AWS cbbiswas@amazon.com. Scale compute

Post on 25-May-2020

23 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark

Deep Dive on MySQL Databases on Amazon RDS

Chayan Biswas

Sr. Product Manager, AWS

cbbiswas@amazon.com

Scale compute and storage with a few clicks; minimal downtime for your application

Automatic Multi-AZ data replication; automated backup, snapshots, and failover

Data encryption at rest and in transit; industry compliance and assurance programs

Amazon Relational Database Service (Amazon RDS) Managed relational database service with a choice of popular database engines

Easy to administer

Easily deploy and maintain hardware, OS, and DB software; built-in monitoring

Performant & scalable Available & durable Secure & compliant

Amazon RDS database engines

Commercial Open source Cloud native

Amazon Elastic Block Store (Amazon EBS)-based storage Amazon Aurora storage system

MySQL compatible

PostgreSQL compatible

Amazon RDS database engines

Commercial Open source Cloud native

Amazon Elastic Block Store (Amazon EBS)-based storage Amazon Aurora storage system

MySQL compatible

PostgreSQL compatible

Agenda

• Why run MySQL

• Why run managed MySQL on Amazon RDS

• Why Aurora MySQL?

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

1. Popular

2. Innovative

3. Flexible

Image credit: By Mackphillips - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=55946550

1. Popular

2. Innovative

3. Flexible

Image credit: By Mackphillips - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=55946550

MySQL is the world’s most popular database

All respondents

Source: Stack Overflow Developer Survey Results 2018 ( https://insights.stackoverflow.com/survey/2018/#technology )

“Most popular” buys you . . .

Large ecosystem of ISVs, Tools, Implementation and Support Partners

Highly exercised, stable code

Large community of users and community-driven resources and a larger DBA talent pool

1. Popular

2. Innovative

3. Flexible

Image credit: By Mackphillips - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=55946550

MySQL 8.0 highlights - FUNCTIONALITY

• Common Table Expressions

• Window functions

• JSON improvements

• 5108 Spatial Reference Systems

• utf8mb4

MySQL 8.0 highlights - AVAILABILITY

• Instant ADD COLUMN

• Unified, transactional data dictionary

• Crash-safe, atomic DDL

MySQL 8.0 highlights - PERFORMANCE

• Hot-spot management

• Descending indexes

• Invisible indexes

• Improved optimizer cost model

• Resource Groups

• Improved replication

MySQL 8.0 highlights – SECURITY, MANAGEABILITY

• Roles

• Password strength

• Open SSL as default TLS/SSL library

• Enhanced observability

MariaDB 10.3 highlights

• Oracle compatibility• PL/SQL compatibility parser

• Sequences

• INTERSECT and EXCEPT to complement UNION

• New ROW type and TYPE OF stored functions

• Invisible Columns

• Cursor with parameters

• Temporal data processing

• User-defined aggregates

• Instant ADD COLUMN for InnoDB

Image credit: By Mackphillips - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=55946550

MariaDB 10.4 highlights (Release Candidate)

• SQL Server compatibility • sql_mode=‘mssql’

• Subset of Microsoft SQL Server's language

• Password expiry and account locking

• Instant DROP COLUMN for InnoDB

Image credit: By Mackphillips - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=55946550

1. Popular

2. Innovative

3. Flexible

Image credit: By Mackphillips - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=55946550

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

1. Popular

2. Innovative

3. Flexible

Hundreds of thousands of customers

Popular buys you . . .

Unrivalled operational excellence

Highly exercised, stable code

Who can you trust?

Automated remediation

World’s most experienced operators

Automation

1. Popular

2. Innovative

3. Flexible

Image credit: By Mackphillips - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=55946550

Amazon RDS highlights - AVAILABILITY

• Automated, 0-RPO failover across AZs

• Managed x-region replicas for DR

• Automated backups, manual snapshots

• Point-in-time recovery

Automated, 0-RPO failover across AZs

DNS

Primary Standby

EC2 #1

EC2 #2

EBS #1

EBS #2

Each host manages set of EBS volumes with a full copy of the data

Instances are monitored by an external observer to maintain consensus over quorum

Availability zone 1 Availability zone 2

App

Automated, 0-RPO failover across AZs

DNS

EC2 #1

EC2 #2

EBS #1

EBS #2

Each host manages set of EBS volumes with a full copy of the data

Instances are monitored by an external observer to maintain consensus over quorum

Failover initiated by automation or through RDS API

Availability zone 1 Availability zone 2

App

Primary Standby

Automated, 0-RPO failover across AZs

DNS

Primary

EC2 #1

EC2 #2

EBS #1

EBS #2

Each host manages set of EBS volumes with a full copy of the data

Instances are monitored by an external observer to maintain consensus over quorum

Failover initiated by automation or through RDS API

Redirection to the new primary instance is provided through DNS

Availability zone 1 Availability zone 2

App

Automated, 0-RPO failover across AZs

DNS

EC2 #1

EC2 #2

EBS #1

EBS #2

Each host manages set of EBS volumes with a full copy of the data

Instances are monitored by an external observer to maintain consensus over quorum

Failover initiated by automation or through RDS API

Redirection to the new primary instance is provided through DNS

Availability zone 1 Availability zone 2

App

Standby Primary

Read Scaling with Read Replicas

Use Amazon RDS read replicas to relieve pressure on your source database with additional read capacity

Create up to five replicas per source database

Monitor replication lag in Amazon CloudWatch or Amazon RDS console

Region

Asynchronous replication

Planning for Disaster Recovery

Use a cross-region read replica as a standby database for recovery in the event of a disaster

Read replicas can be configured for Multi-AZ to reduce recovery time

Can use delayed replication for MySQL to protect from self-inflicted disasters

Region 1

Availability Zone 2

Synchronous replication

Synchronous replication

Availability Zone 3

Availability Zone 1 Availability Zone 4

Region 2

Asynchronous replication

NEW!

NEW!

Backups, Snapshots, and Point-in-time restore

Transaction

Logs

EBS

Vol EBS SnapshotAmazon

S3

App

Two options – automated backups and manual snapshots

EBS snapshots stored in Amazon S3

Transaction logs stored every 5 minutes in S3 to support Point in Time Recovery

No performance penalty for backups

Snapshots can be copied across regions or shared with other accounts

Availability zone 1

Availability zone 2

Region 1

EBS vol.

EBS vol. Region 2

EBS vol.

New Amazon RDS backup features

Retain automated backups

Automated backups are retained for the retention period for the instance

Optionally keep automated backups and transaction logs upon instance deletion

Specify parameter group value on restore

Incremental encrypted snapshot copy

NEW!

NEW!

NEW!

Amazon RDS highlights – SECURITY, MANAGEABILITY

• IAM DB Authentication

• Automated OS and database upgrades

• Push-button scaling

• Managed binlog replication

• Log upload to CloudWatch Logs

• Industry compliance

• Per-second billing NEW!

Recommendations

Example issues:

Engine version outdated, Pending maintenance available, Automated backups disabled, Enhanced Monitoring disabled, Encryption disabled

Parameter recommendations:

Non-default custom memory parameters, Change buffering enabled, Logging to table

Aurora cluster recommendations

NEW!

NEW!

Start and Stop

Solution for development and test environments

Stop and start a running database instance from the console or AWS Command Line Interface (AWS CLI)

Now available for both single-AZ and Multi-AZ DB instances and Aurora DB clusters

While instance is stopped, you only pay for storage

Backup retention window is maintained while stopped

Instances are restarted after seven days

NEW!

Amazon RDS highlights – PERFORMANCE

• R5, M5, and T3 database instance family

• Elastic volumes up to 64 TB

• Up to 80K Provisioned IOPS

NEW!

NEW!

NEW!

Performance Insights

• Measures DB Load

• Identifies bottlenecks (top SQL, wait events)

• Adjustable time frame (hour, day, week, longer)

New features in RDS Performance Insights

• Engine support: MySQL, MariaDB

• Extended data retention• Retain up to two years of performance

data

• Trend performance over time, analyze month-over-month activity, and compare end-of-quarter or end-of-year performance with earlier performance

• Load metrics in CloudWatch• DBLoad

• DBLoadCPU

• DBLoadNonCPU

• AWS CloudFormation support

NEW!

NEW!

NEW!

Monitor RDS with CloudWatch

• Amazon CloudWatch metrics

• CPU/Storage/Memory

• Swap usage

• I/O (read and write)

• Latency (read and write)

• Throughput (read and write)

• Replica lag

• Amazon CloudWatch alarms

• Similar to on-premises monitoring tools

• Enhanced monitoring

• Access to additional CPU, memory, file system, and disk I/O metrics

• As low as one-second intervals

• Integration with third-party monitoring tools

Database activity monitoring and insights

Search: Look for specific events across log files.

Metrics: Measure activity in your Aurora DB cluster.

Continuously monitor activity in your DB clusters by sending audit logs to CloudWatch Logs.

Export to S3 for long term archival; analyze logs using Athena; visualize logs with QuickSight.

Visualizations: Create activity dashboards

Alarms: Get notified or take actions

Amazon CloudWatch

Amazon Athena

Amazon QuickSight

S3

1. Popular

2. Innovative

3. Flexible

Image credit: By Mackphillips - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=55946550

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Amazon Aurora…Enterprise database at open source price

Delivered as a managed service

Amazon Aurora

Speed and availability of high-end commercial databases

Simplicity and cost-effectiveness of open source databases

Drop-in compatibility with MySQL and PostgreSQL

Simple pay as you go pricing

Amazon Aurora innovationsRe-imagining databases for the cloud

Automate administrative tasks – fully managed service

Scale-out, distributed, multi-tenant design

Service-oriented architecture leveraging AWS services

Scale-out, distributed architecture

Purpose-built log-structured distributed storage system designed for databases

Storage volume is striped across hundreds of storage nodes distributed over 3 different availability zones

Six copies of data, two copies in each availability zone to protect against AZ+1 failures

Plan to apply same principles to other layers of the stack

Shared storage volume

Storage nodes with SSDs

Availability

Zone 1

SQL

Transactions

Caching

Availability

Zone 2

SQL

Transactions

Caching

Availability

Zone 3

SQL

Transactions

Caching

Leveraging AWS services

Invoke AWS Lambda events from stored procedures/triggers

Load data from Amazon Simple Storage Service (Amazon S3), store snapshots and backups in S3

Lambda

function

Amazon

S3

AWS Identity

and Access

Management

Amazon

CloudWatch

Use AWS Identity and Access Management (IAM) roles to manage database access control

Upload systems metrics and audit logs to CloudWatch

1. Popular

2. Innovative

3. Flexible

Image credit: By Mackphillips - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=55946550

Aurora customer adoption

Fastest growing service in AWS historyAurora is used by ¾ of the top 100 AWS customers

Who is moving to Aurora and why?

Customers using

open source engines

• Higher performance – up to 5x

• Better availability and durability

• Reduces cost – up to 60%

• Easy migration; no

application change

Customers using

commercial engines

• One tenth of the cost; no licenses

• Integration with cloud ecosystem

• Comparable performance and

availability

• Migration tooling and services

1. Popular

2. Innovative

3. Flexible

Image credit: By Mackphillips - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=55946550

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark

Enterprise-grade

Performance and Scalability

Write and read throughputAurora MySQL is 5x faster than MySQL

0

50,000

100,000

150,000

200,000

250,000

MySQL 5.6 MySQL 5.7 MySQL 8.0

Aurora 5.6 Aurora 5.7

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

800,000

MySQL 5.6 MySQL 5.7 MySQL 8.0

Aurora 5.6 Aurora 5.7

Write Throughput Read Throughput

Using Sysbench with 250 tables and 200,000 rows per table on R4.16XL

Bulk data load performanceAurora MySQL loads data 2.5x faster than MySQL

Data loading

Data loading

Index build

Index build

0 100 200 300 400 500 600 700 800

MySQL

Amazon

Aurora

Runtime (sec.)

10 Sysbench Tables, 10MM rows per each

How did we achieve this?

Do less work

• Do fewer IOs

• Minimize network packets

• Cache prior results

• Offload the database engine

Be more efficient

• Process asynchronously

• Reduce latency path

• Use lock-free data structures

• Batch operations together

• Databases are all about I/O

• Network-attached storage is all about packets/second

• High-throughput processing is all about context switches

Aurora I/O profile

MySQL with Replica Amazon Aurora

EBS mirrorEBS mirror

AZ 1 AZ 2

EBSAmazon Elastic

Block Store (EBS)

Primary

Instance

Replica

Instance

1

2

3

4

5

Amazon

S3

MySQL I/O profile for 30 min Sysbench run

780K transactions

7,388K I/Os per million txns (excludes mirroring, standby)

Average 7.4 I/Os per transaction

AZ 1 AZ 3

Primary

Instance

AZ 2

Replica

Instance

ASYNC 4/6 QUORUM

Distributed writes

Replica

Instance

Amazon

S3

Aurora IO profile for 30 min Sysbench run

27,378K transactions – 35X MORE

0.95 I/Os per transaction (6X amplification) – 7.7X LESS

Binlog Data Double-writeLog From files

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Aurora lock management

Scan

Delete

Scan

Delete

Insert

Scan Scan

Insert

Delete

Scan

Insert

Insert

MySQL lock manager Aurora lock manager

Same locking semantics as MySQL

Concurrent access to lock chains

Concurrent access to lock chain and lock

manager and to update simultaneously.

Lock-free deadlock detection

Instant crash recovery

Traditional database

Have to replay logs since the last checkpoint

Typically 5 minutes between checkpoints

Single-threaded in MySQL; requires a large number of disk accesses

Amazon Aurora

Underlying storage replays redo records on demand as part of a disk read

Parallel, distributed, asynchronous

No replay for startup

Checkpointed Data Redo Log

Crash at T0 requires

a re-application of the

SQL in the redo log since

last checkpoint

T0 T0

Crash at T0 will result in redo logs being

applied to each segment on demand, in

parallel, asynchronously

When Database fails – recovery is fast <30 seconds

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435

0 - 5s – 30% of fail-overs

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

45.00%

50.00%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

5 - 10s – 40% of fail-overs

0%

10%

20%

30%

40%

50%

60%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

10 - 20s – 25% of fail-overs

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

20 - 30s – 5% of fail-overs

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark

Recent Innovations

Aurora MySQL

Performance

• Parallel Query

Availability

• Global Database, Backtrack

Manageability

• Serverless, Cluster start/stop, DB Log upload to CloudWatch Logs,

Synchronous Lambda calls, Custom endpoints

Security

• Encrypted self-managed MySQL to Aurora migration

Parallel query processing

Aurora storage has thousands of CPUs • Opportunity to push down and parallelize query

processing

• Moving processing close to data reduces network traffic and latency

However, there are significant challenges• Data is not range partitioned – require full scans

• Data may be in-flight

• Read views may not allow viewing most recent data

• Not all functions can be pushed down

Database Node

Storage nodes

Push down

predicatesAggregate

results

https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/

Well-known decision support benchmark

0x

20x

40x

60x

80x

100x

120x

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10Q11Q12Q13Q14Q15Q16Q17Q18Q19Q20Q21Q22

Query response time reduction

Peak speed up ~120x

>10x speedup: 8 of 22 queries

We were able to test Aurora’s parallel query feature and the performance gains were very good. To be specific, We were able to reduce the instance type from r3.8xlarge to r3.2xlarge. For this use-case, parallel query was a great win for us.

Jyoti Shandil, Cloud Data Architect

Parallel Query: Performance results

Aurora Global Database

Region 1

AZ 2 AZ 3AZ 1

MasterReaderReader

Aurora Storage

Outbound

Replication

Fleet

Inbound

Replication

Fleet

Region 2

AZ 2 AZ 3AZ 1

Reader

Aurora Storage

Region n

High throughput: Up to 200K writes/sec – negligible performance impact

Low replica lag: < 1 sec cross-country replica lag under heavy load

Fast recovery: < 1 min to accept full read-write workloads after region failure

https://aws.amazon.com/rds/aurora/global-database/

Database Backtrack

t0 t1 t2

t0 t1

t2

t3 t4

t3

t4

Rewind to t1

Rewind to t3

Invisible Invisible

Backtrack brings the database to a point in time without requiring restore from backups• Backtracking from an unintentional DML or DDL operation

• Backtrack is not destructive. You can backtrack multiple times to find the right point in time

• Also useful for QA (rewind your DB between test runs)

https://aws.amazon.com/blogs/aws/amazon-aurora-backtrack-turn-back-time/

Fast database cloning

Create a copy of a database without duplicate storage costs

• Creation of a clone is nearly instantaneous –we don’t copy data

• Data copy happens only on write – when original and cloned volume data differ

Typical use cases:

• Clone a production DB to run tests

• Reorganize a database

• Save a point in time snapshot for analysis without impacting production system.

Production database

Clone Clone

CloneDev/test

applications

Benchmarks

Production applications

Production applications

https://aws.amazon.com/blogs/aws/amazon-aurora-fast-database-cloning/

Aurora Serverless

Starts up on demand, shuts down when not in use

Scales up/down automatically

No application impact when scaling

Pay per second, 1 minute minimum

WARM POOL OF INSTANCES

APPLICATION

DATABASE STORAGE

SCALABLE DB CAPACITY

REQUEST ROUTERS

https://aws.amazon.com/getting-started/tutorials/configure-connect-serverless-mysql-database-aurora/

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html

Amazon RDS Data API for serverless applications

Millions of

IOT/mobile devices Data API fleet

API

End-point

Amazon Aurora

Serverless

Access through simple web interface

• Public endpoint addressable from anywhere

• No client configuration required

• No persistent connections required

Ideal for Serverless applications (Lambda)

Ideal for light-weight applications (IOT)

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html

1. Popular

2. Innovative

3. Flexible

Image credit: By Mackphillips - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=55946550

“All we have to seeIs I don’t belong to youAnd you don’t belong to me.”

George Michael

Freedom! ’90

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark

Common Amazon Aurora migration options

Source database From where Recommended option

RDS

EC2, on premises

EC2, on premises, RDS

Console-based automated

snapshot ingestion and catch

up via binlog replication.

Binary snapshot ingestion

through S3 and catch up via

binlog replication.

Schema conversion using

SCT and data migration via

DMS.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark

Thank you!

Chayan Biswas

cbbiswas@amazon.com

top related