Goldilocks and the Three MySQL Queries

Goldilocks And The Three Queries – MySQL's EXPLAIN Explained

Dave StokesMySQL Community Manager, North [email protected]

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Please Read

Simple Introduction

EXPLAIN & EXPLAIN EXTENDED are tools to help optimize queries. As tools there are only as good as the crafts persons using them. There is more to this subject than can be covered here in a single presentation. But hopefully this session will start you out on the right path for using EXPLAIN.

Why worry about the optimizer?

Client sends statement to server

Server checks the query cache to see if it has already run statement. If so, it retrieves stored result and sends it back to the Client.

Statement is parsed, preprocessed and optimized to make a Query Execution Plan.

The query execution engine sends the QEP to the storage engine API.

Results sent to the Client.

Once upon a time ...

There was a PHP Programmer named Goldilocks who wanted to get the phone number of her friend Little Red Riding Hood in Networking’s phone number. She found an old, dusty piece of code in the enchanted programmers library. Inside the code was a special chant to get all the names and phone numbers of the employees of Grimm-Fayre-Tails Corp. And so, Goldi tried that special chant!

SELECT name, phone

FROM employees;

Oh-No!

But the special chant kept running, and running, and running.

Eventually Goldi control-C-ed when she realized that Grimm hired many, many folks after hearing that the company had 10^10 employees in the database.

A second chant

Goldi did some searching in the library and learned she could add to the chant to look only for her friend Red.

SELECT name, phone

FROM employees

WHERE name LIKE 'Red%';

Goldi crossed her fingers, held her breath, and let 'er rip.

What she got

Name, phoneRedford 1234Redmund 2323Redlegs 1234Red Sox 1914Redding 9021

– But this was not what Goldilocks needed. So she asked a kindly old Java Owl for help

The Owl's chant

'Ah, you want the nickname field!' He re-crafted her chant.

SELECT first, nick, last, phone, group

FROM employees

WHERE nick LIKE '%red%';

Still too much data … but better

Betty, Big Red, Lopez, 4321, AccountingEthel, Little Red, Riding-Hoode, 127.0.0.1, NetworksAgatha, Red Herring, Christie, 007, Public RelationsJohnny, Reds Catcher, Bench, 421, Gaming

'We can tune the query better'

Cried the Owl.

SELECT first, nick, name, phone, group

WHERE nick LIKE 'Red%'

AND group = 'Networking';

But Goldi was too busy after she got the data she needed to listen.

The preceding were obviously flawed queries

• But how do you check if queries are running efficiently?

• What does the query the MySQL server runs really look like? (the dreaded Query Execution Plan). What is cost based optimization?

• How can you make queries faster?

EXPLAIN & EXPLAIN EXTENDED

EXPLAIN [EXTENDED | PARTITIONS]

{

SELECT statement

| DELETE statement

| INSERT statement

| REPLACE statement

| UPDATE statement

}

Or EXPLAIN tbl_name (same as DESCRIBE tbl_name)

What is being EXPLAINed

Prepending EXPLAIN to a statement* asks the optimizer how it would plan to execute that statement (and sometimes it guesses wrong) at lowest cost (measures in disk page seeks*).

What it can tell you:--Where to add INDEXes to speed row access--Check JOIN order

And Optimizer Tracing (more later) has been recently introduced!

* SELECT, DELETE, INSERT, REPLACE & UPDATE as of 5.6, only SELECT 5.5 & previous* Does not know if page is in memory, on disk (storage engine's problem, not optimizer), see

MySQL Manual 7.8.3

The Columns

id Which SELECT

select_type The SELECT type

table Output row table

type JOIN type

possible_keys Potential indexes

key Actual index used

key_ken Length of actual index

ref Columns used against index

rows Estimate of rows

extra Additional Info

A first look at EXPLAIN...using World database

Will read all 4079 rows – all the

rows in this table

EXPLAIN EXTENDED -> query plan

Filtered: Estimated % of rows filteredBy condition

The query as seen by server (kind of, sort of, close)

Add in a WHERE clause

Time for a quick review of indexes

Advantages– Go right to desired

row(s) instead of reading ALL ROWS

– Smaller than whole table (read from disk faster)

– Can 'carry' other data with compound indexes

Disadvantages– Overhead*

• CRUD– Not used on full table

scans

* May need to run ANALYZE TABLE to update statistics such as cardinality to help optimizer make better choices

Quiz: Why read 4079 rows when only five are needed?

Information in the type Column

ALL – full table scan (to be avoided when possible)CONST – WHERE ID=1EQ_REF – WHERE a.ID = b.ID (uses indexes, 1 row returned)REF – WHERE state='CA' (multiple rows for key values)REF_OR_NULL – WHERE ID IS NULL (extra lookup needed for NULL)INDEX_MERGE – WHERE ID = 10 OR state = 'CA'RANGE – WHERE x IN (10,20,30)INDEX – (usually faster when index file < data file)UNIQUE_SUBQUERY – INDEX-SUBQUERY – SYSTEM – Table with 1 row or in-memory table

Full table scans VS Index

So lets create a copy of the World.City table that has no indexes. The optimizer estimates that it would require 4,279 rows to be read to find the desired record – 5% more than actual rows.

And the table has only 4,079 rows.

How does NULL change things?

Taking NOT NULL away from the ID field (plus the previous index) increases the estimated rows read to 4296! Roughly 5.5% more rows than actual in file.

Running ANALYZE TABLE reduces the count to 3816 – still > 1

Both of the following return 1 row

EXPLAIN PARTITIONS -Add 12 hash partitions to City

Some parts of your querymay be hidden!!

Latin1 versus UTF8

Create a copy of the City table but with UTF8 character set replacing Latin1. The three character key_len grows to nine characters. That is more data to read and more to compare which is pronounced 'slower'.

INDEX Length

If a new index on CountryCode with length of 2 bytes, does it work as well as the original 3 bytes?

Forcing use of new shorter index ...

Still generates a guesstimate that 39 rows must be read.

In some cases there is performance to be gained in using shorter indexes.

Subqueries

Run as part of EXPLAIN execution and may cause significant overhead. So be careful when testing.

Note here that #1 is not using an index. And that is why we recommend rewriting sub queries as joins.

EXAMPLE of covering Indexing

In this case, adding an index reduces the reads from 239 to 42.

Can we do better for this query?

Index on both Continent and Government Form

With both Continent and GovernmentForm indexed together, we go from 42 rows read to 19.

Using index means the data is retrieved from index not table (good)

Using index condition means eval pushed down to storage engine. This can reduce storage engine read of table and server reads of storage engine (not bad)

Extra ***

USING INDEX – Getting data from the index rather than the table

USING FILESORT – Sorting was needed rather than using an index. Uses file system (slow)

ORDER BY can use indexesUSING TEMPORARY – A temp table was created –

see tmp_table_size and max_heap_table_sizeUSING WHERE – filter outside storage engineUsing Join Buffer -- means no index used.

Things can get messy!

straight_join forces order of tables

Index Hints

index_hint: USE {INDEX|KEY} [{FOR {JOIN|ORDER BY|

GROUP BY}] ([index_list]) | IGNORE {INDEX|KEY} [{FOR {JOIN|ORDER BY|

GROUP BY}] (index_list) | FORCE {INDEX|KEY} [{FOR {JOIN|ORDER BY|

GROUP BY}] (index_list)

Use only as a last resort – shifts in data can make this the 'long way around'.

http://dev.mysql.com/doc/refman/5.6/en/index-hints.html

Controlling the Optimizer

mysql> SELECT @@optimizer_switch\G*************************** 1. row

***************************@@optimizer_switch:

index_merge=on,index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, engine_condition_pushdown=on, index_condition_pushdown=on, mrr=on,mrr_cost_based=on,

block_nested_loop=on,batched_key_access=off

You can turn on or off certain optimizer settings for GLOBAL or SESSION

See MySQL Manual 7.8.4.2 and know your mileage may vary.

Things to watchmysqladmin -r -i 10 extended-status

Slow_queries – number in last periodSelect_scan – full table scansSelect_full_join full scans to completeCreated_tmp_disk_tables – file sortsKey_read_requerts/Key_wrtie_requests – read/write

weighting of application, may need to modify application

Optimizer Tracing (6.5.3 onward)

SET optimizer_trace="enabled=on";SELECT Name FROM City WHERE ID=999;SELECT trace into dumpfile '/tmp/foo' FROM

INFORMATION_SCHEMA.OPTIMIZER_TRACE;

Shows more logic than EXPLAIN

The output shows much deeper detail on how the optimizer chooses to process a query. This level of detail is well past the level for this presentation.

Sample from the trace – but no clues on optimizing for Joe Average DBA

Final Thoughts

1. READ chapter 7 of the MySQL Manual2. Run ANALYZE TABLE periodically3. Minimize disk I/o

Q&A

[email protected]

Goldilocks and the Three MySQL Queries

Technology

statement update statement

red sox

red herring

big red

extended query plan

query cache

query bettercried

query execution engine