Top Banner
Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp MySQL Practice Manager
77

Reducing Risk When Upgrading MySQL

Jan 21, 2018

Download

Internet

Kenny Gryp
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reducing Risk When Upgrading MySQL

Reducing Risk When UpgradingYour MySQL Environment 

Kenny Gryp MySQL Practice Manager

Page 2: Reducing Risk When Upgrading MySQL

My Experience as MySQLConsultant On Upgrading MySQLit's quite complex...

Kenny Gryp MySQL Practice Manager

Page 3: Reducing Risk When Upgrading MySQL

Table of Contents

The O�cial Documentation

Make Your Own Documentation

Potential Risks

Establish Upgrade Method For A Single Server

Rollback Scenario Testing

Test Writes

Test Individual Reads

Workload Testing

Establish (& Test) Migration Process

Migration In Production

(Rollback)

Post-Migration Assessment

3 / 77

Page 4: Reducing Risk When Upgrading MySQL

The O�cial Documentation

4 / 77

Page 5: Reducing Risk When Upgrading MySQL

Oracle's Recommended Process

Backup your dataRead all release notes and assesshttps://dev.mysql.com/doc/relnotes/mysql/5.7/en/Read Changes Affecting Upgrades to MySQL 5.7https://dev.mysql.com/doc/refman/5.7/en/upgrading-from-previous-series.html

5 / 77

Page 6: Reducing Risk When Upgrading MySQL

Oracle's Recommended Process

Upgrade Slaves First

6 / 77

Page 7: Reducing Risk When Upgrading MySQL

Oracle's Recommended Process

Upgrade Slaves FirstIn-Place Upgrade:

Clean shutdown (innodb_fast_shutdown=0)Run mysql_upgrade

Logical Upgrade:

mysqldump dataImport data againRun mysql_upgrade to �x mysql schema

http://dev.mysql.com/doc/refman/5.7/en/upgrading.html

7 / 77

Page 8: Reducing Risk When Upgrading MySQL

Oracle's Recommended Process (cont.)

A Lot of Risk:

No guarantee queries will execute the sameNo guarantee queries will be same speed or fasterNo guarantee all your queries will still work (new defaultstricter sql_mode)

There is no o�cial support to upgrade from <5.6 to 5.7

but we might actually be able to do that

8 / 77

Page 9: Reducing Risk When Upgrading MySQL

do-it-yourself

Documenting The Process

9 / 77

Page 10: Reducing Risk When Upgrading MySQL

Documenting The Process

PEBKAC: Human errors happen and create issues

import data using wrong character setsetting up replica using wrong binlog �le/pos...

Document every step, we need to repeat it multiple times

10 / 77

Page 11: Reducing Risk When Upgrading MySQL

making you afraid to upgrade by describing

Potential Risks

11 / 77

Page 12: Reducing Risk When Upgrading MySQL

Optimizer Changes

Example: index_merge_intersection

Often seen during migrations to MySQL 5.6Affects environments with sub-optimal indexingQueries with c1='a' AND c2='b' when composite index(c1,c2) is missingIs often slower when selectivity with 1 of the 2 columns isbad (and it happens frequently)Result: a lot of queries were slower in new environment

Need SELECT performance tests between versions

https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/

12 / 77

Page 13: Reducing Risk When Upgrading MySQL

New Defaults In MySQL 5.7

The new defaults in MySQL 5.7 make a lot of sense:

More use of available features and performanceenhancements out of the boxMore strictness with data/query validation

New Reserved wordsApplications might not be ready for it.

Drupal 7 - https://www.drupal.org/node/2545480They will/might break the application more easily:sql_mode=ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES, NO_ZERO_IN_DATE,NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER, NO_ENGINE_SUBSTITUTIONinnodb_strict_mode=1

Needs SELECT & DML query validity tests between versions 13 / 77

Page 14: Reducing Risk When Upgrading MySQL

Other Changes in MySQL 5.7

Passwords that use the older pre-4.1 password hashing formatis removed.

14 / 77

Page 15: Reducing Risk When Upgrading MySQL

MySQL 5.0.37

+-------+| 0 |+-------+

MySQL 5.0.45

+-------+| 1 |+-------+

Minor Versions Also At Risk

CREATE TABLE date (d DATE);INSERT INTO date VALUES ('2017-04-19');SELECT COUNT(*) FROM date WHERE d < NOW()-INTERVAL 1 DAY;

Seen with DELETE FROM date WHERE d < NOW()-INTERVAL 1 DAY in binlog_format=STATEMENTenvironments.Needs SELECT & DML query result tests between versions

15 / 77

Page 16: Reducing Risk When Upgrading MySQL

Workload

SYNC_BINLOG=1 in MySQL 5.7

Can impact certain environments, might not be noticed whenlooking at a single query

InnoDB LRU Flushing changes require tuning for heavyworkloads in 5.6 (innodb_lru_scan_depth)When switching to MySQL 8.0 with the new data dictionary...Need to do Workload Testing between versions

http://mysqlentomologist.blogspot.com/2015/10/fun-with-bugs-38-regression-bugs-in.htmlhttp://lefred.be/content/sync_binlog-1-in-5-7/

16 / 77

Page 17: Reducing Risk When Upgrading MySQL

How Do We Reduce All This Risk?

17 / 77

Page 18: Reducing Risk When Upgrading MySQL

Testing!

18 / 77

Page 19: Reducing Risk When Upgrading MySQL

establish

Upgrade Method For A Single Server

19 / 77

Page 20: Reducing Risk When Upgrading MySQL

Upgrade Method For A Single Server

Follow MySQL documentation:http://dev.mysql.com/doc/refman/5.7/en/upgrading.htmlEnsure to document every commandRestore from backupOr take a replica you can miss

20 / 77

Page 21: Reducing Risk When Upgrading MySQL

Upgrade Method For A Single Server

21 / 77

Page 22: Reducing Risk When Upgrading MySQL

Upgrade Method For A Single Server

22 / 77

Page 23: Reducing Risk When Upgrading MySQL

Replication Consistency

Testing Writes

23 / 77

Page 24: Reducing Risk When Upgrading MySQL

Writes - Replication Consistency

pt-table-checksum: validate consistency in a replication topologyIdentify problems caused by PEBKACEnsure events replicate properly(binlog_format=STATEMENT)Upgrade a replica or add a replica which is using the modi�edversion.Do it on production, will have no result in test/staging

https://www.percona.com/doc/percona-toolkit/3.0/pt-table-checksum.html

24 / 77

Page 25: Reducing Risk When Upgrading MySQL

Writes - Replicate Test Server

25 / 77

Page 26: Reducing Risk When Upgrading MySQL

often left behind is

Rollback Scenario Testing

26 / 77

Page 27: Reducing Risk When Upgrading MySQL

Rollback Scenario Testing

Possibility to fall back in case something went wrong duringmigrationCan be done using replication, but has to be tested!

27 / 77

Page 28: Reducing Risk When Upgrading MySQL

Writes - Rollback Testing

28 / 77

Page 29: Reducing Risk When Upgrading MySQL

Rollback Scenario Testing

You might need to change some settings to your new my.cnf to beable to support replicating back.

Example:

binlog_checksum = NONEbinlog_row_image = FULLbinlog_rows_query_log_events = OFFlog_bin_use_v1_row_events = 1gtid_mode = OFFlog_slave_updates=1skip-slave-start

29 / 77

Page 30: Reducing Risk When Upgrading MySQL

Writes - Checksums - GTID

30 / 77

Page 31: Reducing Risk When Upgrading MySQL

Writes - Checksums - Non-GTID

31 / 77

Page 32: Reducing Risk When Upgrading MySQL

Writes - Checkums - ROW

32 / 77

Page 33: Reducing Risk When Upgrading MySQL

Writes - Checkums - ROW

33 / 77

Page 34: Reducing Risk When Upgrading MySQL

Where To Run pt-table-checksum?

GTID:

pt-table-checksum can only be run on Master (Errant Transactions)Or scratch the pt-table-checksum host after tests

non-GTID:

pt-table-checksum can be run on intermediate masterbinlog_format=ROW:

only 1 tier below can be checksummedrun on every tier that has a replica (for rollback)

pt-table-checksum can bring prod overhead when run onactive masterLet replication run for a while before checksumming

34 / 77

Page 35: Reducing Risk When Upgrading MySQL

pt-table-checksum results

On every replica (including rollback):

SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunksFROM percona.checksumWHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc))GROUP BY db, tbl;

+----+-----------------+------------+--------+| db | tbl | total_rows | chunks |+----+-----------------+------------+--------+| db | telephone_debit | 44342 | 1 || db | orderline | 21451 | 3 || db | orders | 25125215 | 12 |+----+-----------------+------------+--------+

35 / 77

Page 36: Reducing Risk When Upgrading MySQL

pt-table-checksum - Analysis

Troubleshooting starts now...

What went wrong?

36 / 77

Page 37: Reducing Risk When Upgrading MySQL

pt-table-checksum - Analysis

Which chunks failed?

db: db tbl: telephone_debit chunk: 100 chunk_time: 0.4956125 chunk_index: PRIMARYlower_boundary: 5014733upper_boundary: 5059074 this_crc: 7fd37eb9 this_cnt: 44342 master_crc: b7babd94 master_cnt: 44342 ts: 2013-02-05 01:59:48

37 / 77

Page 38: Reducing Risk When Upgrading MySQL

pt-table-checksum - Analysis

Which chunks failed?

db: db tbl: telephone_debit chunk: 100 chunk_time: 0.4956125 chunk_index: PRIMARYlower_boundary: 5014733upper_boundary: 5059074 this_crc: 7fd37eb9 this_cnt: 44342 master_crc: b7babd94 master_cnt: 44342 ts: 2013-02-05 01:59:48

38 / 77

Page 39: Reducing Risk When Upgrading MySQL

pt-table-checksum - Analysis

SELECT * INTO outfile '/tmp/telephone_debit_mysql56'FROM db.telephone_debitWHERE id BETWEEN 5014733 AND 5059074;

SELECT * INTO outfile '/tmp/telephone_debit_mysql57'FROM db.telephone_debitWHERE id BETWEEN 5014733 AND 5059074;

# diff -u /tmp/telephone_debit_mysql5{6,7}

39 / 77

Page 40: Reducing Risk When Upgrading MySQL

pt-table-checksum - Analysis

SELECT * INTO outfile '/tmp/telephone_debit_mysql56'FROM db.telephone_debitWHERE id BETWEEN 5014733 AND 5059074;

SELECT * INTO outfile '/tmp/telephone_debit_mysql57'FROM db.telephone_debitWHERE id BETWEEN 5014733 AND 5059074;

# diff -u /tmp/telephone_debit_mysql5{6,7}

Use twindb_table_compare! https://github.com/twindb/twindb_table_compare

40 / 77

Page 41: Reducing Risk When Upgrading MySQL

pt-table-checksum - Analysis

Wrong upgrade method

backupswrong replication �le/pos...binlog_format=STATEMENT using (UUID()...)

Common Seen Issues replicating older versions:

Floating point differences: Storing currencies in a DOUBLETemporal data typesInvalid dates converted to zero datesTrailing spaces in CHAR �elds

41 / 77

Page 42: Reducing Risk When Upgrading MySQL

Testing Writes

Consistency Checks Process:

ChecksumCheck for differences

On new environmentOn rollback environment

For each inconsistency

Analyze diffFind root causeFix problemDocument problem & solution

Repeat checksum again

42 / 77

Page 43: Reducing Risk When Upgrading MySQL

Testing Individual Reads

43 / 77

Page 44: Reducing Risk When Upgrading MySQL

Testing Reads - Collect Queries

44 / 77

Page 45: Reducing Risk When Upgrading MySQL

Testing Reads - Collect Queries

Collection Techniques:

Slow Query Log

long_query_time=0

Careful when ~+10000 QPSPercona Server: log_slow_rate_limit

tcpdump

'packets lost' in libpcap

Application/Load Balancer queries

Ensure:

Get the full workload (long enough)Get data from Master & ReplicasCollect batchjob queries running at night

https://www.percona.com/doc/percona-server/5.7/diagnostics/slow_extended.html

45 / 77

Page 46: Reducing Risk When Upgrading MySQL

Testing Reads - Setup 2 Environments

46 / 77

Page 47: Reducing Risk When Upgrading MySQL

Testing Reads - Setup 2 Environments

Need 2 Test Servers:

Reuse servers from checksum + rollbackEnsure they have the same data (break replication at same time)Same HW speci�cationsSimilar Con�gurations on buffer pool, flatc...Fast enough to more or less resemble productionOptionally can be done using 1 machine (pt-upgrade --save-results)

47 / 77

Page 48: Reducing Risk When Upgrading MySQL

Testing Reads - pt-upgrade

48 / 77

Page 49: Reducing Risk When Upgrading MySQL

Testing Reads - pt-upgrade

pt-upgrade:

runs one query at a time on both test environmentscompares differences:

warnings/errorsresultset (even different order)query response time

Run pt-upgrade on third host with similar network latencyRun twice to warm up buffer pool �rst (need to be equal)Can also compare writes for execution time & warningsFilter slowlog initially to limit similar queries

pt-query-digest --no-report --output slowlog --samples 20https://www.percona.com/doc/percona-toolkit/3.0/pt-upgrade.html

49 / 77

Page 50: Reducing Risk When Upgrading MySQL

Testing Reads - pt-upgrade

Reporting class because there are 1000 row diffs.

Total queries 10Unique queries 10Discarded queries 0

select ... from ...

#### Row diffs: 10##-- 1.@ row 2< 13178,"dim0",37,2,21,,,0,0,0,1,NULL,NULL> 13178,"dimø",37,2,21,,,0,0,0,1,NULL,NULL...

50 / 77

Page 51: Reducing Risk When Upgrading MySQL

Testing Reads - pt-upgrade

Reporting class because it has diffs, but hasn't been reported yet.

SELECT * FROM `database`.table WHERE treeid = '' AND productid='0'

## Warning diffs: 2

Code: 1366 Level: WarningMessage: Incorrect integer value: '' for column 'treeid' at row 1

vs.

No warning 1366

51 / 77

Page 52: Reducing Risk When Upgrading MySQL

Testing Reads - pt-upgrade

SELECT *FROM `database`.client_ordersWHERE client=? AND blacklist=? LIMIT ?

## Query time diffs: 1

-- 1.

0.000513 vs. 0.036395 seconds (70.9x increase)

SELECT *FROM `database`.client_ordersWHERE client=57450 AND blacklist=1 LIMIT 1

52 / 77

Page 53: Reducing Risk When Upgrading MySQL

Testing Reads Process

Collect queriesRun pt-upgrade (twice)For each entry in report

Figure out why it is reportedDeploy �x in Prod ApplicationMake schema changesDocument analysis

Run pt-upgrade again

53 / 77

Page 54: Reducing Risk When Upgrading MySQL

one of the most challenging is

Testing Workload

54 / 77

Page 55: Reducing Risk When Upgrading MySQL

Workload Testing - Percona Playback

55 / 77

Page 56: Reducing Risk When Upgrading MySQL

Workload Testing - Query Playback

Uses slowlog to replay queries

Needs long_query_time=0 - challenging on busy serversEnough data during peak workload

Tries to execute workload as realistically as possible same connections, same transactions, same delays betweenqueriesRun against both environments, compare speed

Think about preloading buffer on both the same way

Active development by Marius Wachtler (ex)-DropBox! Thankyou!(uno�cal product of Percona, no support)

56 / 77

Page 57: Reducing Risk When Upgrading MySQL

Workload Testing - ProxySQL Mirroring

57 / 77

Page 58: Reducing Risk When Upgrading MySQL

Workload Testing - ProxySQL Mirroring

Mirror queries from Load Balancer to test environmentGood Blogpost: https://www.pythian.com/blog/using-proxysql-validate-mysql-updates/

58 / 77

Page 59: Reducing Risk When Upgrading MySQL

establish (& test)

Migration Process

59 / 77

Page 60: Reducing Risk When Upgrading MySQL

Migration Process

Create Migration Plan

Different for every environment/applicationUpgrade a replica �rst for a couple of days/weeks?How to switch masters?

How is failover being handled nowadays? MHA, Orchestrator, Manual, GTID/msyqlrpladmin...?

Test in staging!

60 / 77

Page 61: Reducing Risk When Upgrading MySQL

the actual

Migration In Production

61 / 77

Page 62: Reducing Risk When Upgrading MySQL

Migration - Create Slave Environments

62 / 77

Page 63: Reducing Risk When Upgrading MySQL

Migration - Redirect Read Tra�c

63 / 77

Page 64: Reducing Risk When Upgrading MySQL

Migration - Application Switchover - 1

64 / 77

Page 65: Reducing Risk When Upgrading MySQL

Migration - Application Switchover - 2

65 / 77

Page 66: Reducing Risk When Upgrading MySQL

you (think you) will never need to do a

Rollback

66 / 77

Page 67: Reducing Risk When Upgrading MySQL

Rollback

67 / 77

Page 68: Reducing Risk When Upgrading MySQL

Rollback

What went wrong?I did not follow the full process! (or I forgot to document it)Do consistency checks again!

68 / 77

Page 69: Reducing Risk When Upgrading MySQL

after all that testing, it's ok to spend time doing

Post-Migration Assessment

69 / 77

Page 70: Reducing Risk When Upgrading MySQL

Post-Migration

Check trending for different behavior

more cpu load?more disk IO?higher amount of innodb_rows_* and handler_*threads_running stability?do some query optimization

If all looks good, scratch the 5.6 rollback & make it 5.7Remove the rollback speci�c con�guration options

70 / 77

Page 71: Reducing Risk When Upgrading MySQL

Post Migration Cleanup

71 / 77

Page 72: Reducing Risk When Upgrading MySQL

small recap

Summary

72 / 77

Page 73: Reducing Risk When Upgrading MySQL

Multi-Use

(Minor MySQL version upgrades)Major MySQL version upgradesSwitching Hardware from Intel -> AMD archictureUsing a new kernel/libc/memory allocatorSwitching storage enginesMariaDB/Percona Server/MySQL...

73 / 77

Page 74: Reducing Risk When Upgrading MySQL

Do I really have to go through this?

Many success stories:

Have done several MySQL upgrades from 4.1 -> 5.5 without intermediate slavesUpgraded environments with major schema changes in the mix (mssql-style environments using stored procedures only)Found numerous application bugs using this processOptimized many customers schemas/queries in the meantime

As long as you follow this process completely, the risk of running into problems is quite small.

74 / 77

Page 75: Reducing Risk When Upgrading MySQL

Do I really have to go through this?

It Depends:

Your business might be risk-averse: every change has to be thoroughly testedOther companies just upgrade a replica in production and seehow it goes

My suggestion to do this at least for:

Major MySQL version upgradesSwitching storage engines

75 / 77

Page 76: Reducing Risk When Upgrading MySQL

Summary

Test Step Skip?

Document Upgrade Single Server Really? Why?

Rollback Scenarios Not Recommended

Consistency Checks Required, No Debate!

Read Tests Strongly Suggested

Workload Tests Possible (Early Adopter Alert)

Migration Tests Not Recommended To Skip

76 / 77

Page 77: Reducing Risk When Upgrading MySQL

Reducing Risk When UpgradingYour MySQL EnvironmentQ&A!

Kenny Gryp MySQL Practice Manager