Goldilocks And The Three Queries – MySQL's EXPLAIN Explained Dave Stokes MySQL Community Manager, North America [email protected]
Jan 14, 2015
Goldilocks And The Three Queries – MySQL's EXPLAIN Explained
Dave StokesMySQL Community Manager, North [email protected]
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Please Read
Simple Introduction
EXPLAIN & EXPLAIN EXTENDED are tools to help optimize queries. As tools there are only as good as the crafts persons using them. There is more to this subject than can be covered here in a single presentation. But hopefully this session will start you out on the right path for using EXPLAIN.
Why worry about the optimizer?
Client sends statement to server
Server checks the query cache to see if it has already run statement. If so, it retrieves stored result and sends it back to the Client.
Statement is parsed, preprocessed and optimized to make a Query Execution Plan.
The query execution engine sends the QEP to the storage engine API.
Results sent to the Client.
Once upon a time ...
There was a PHP Programmer named Goldilocks who wanted to get the phone number of her friend Little Red Riding Hood in Networking’s phone number. She found an old, dusty piece of code in the enchanted programmers library. Inside the code was a special chant to get all the names and phone numbers of the employees of Grimm-Fayre-Tails Corp. And so, Goldi tried that special chant!
SELECT name, phone
FROM employees;
Oh-No!
But the special chant kept running, and running, and running.
Eventually Goldi control-C-ed when she realized that Grimm hired many, many folks after hearing that the company had 10^10 employees in the database.
A second chant
Goldi did some searching in the library and learned she could add to the chant to look only for her friend Red.
SELECT name, phone
FROM employees
WHERE name LIKE 'Red%';
Goldi crossed her fingers, held her breath, and let 'er rip.
What she got
Name, phoneRedford 1234Redmund 2323Redlegs 1234Red Sox 1914Redding 9021
– But this was not what Goldilocks needed. So she asked a kindly old Java Owl for help
The Owl's chant
'Ah, you want the nickname field!' He re-crafted her chant.
SELECT first, nick, last, phone, group
FROM employees
WHERE nick LIKE '%red%';
Still too much data … but better
Betty, Big Red, Lopez, 4321, AccountingEthel, Little Red, Riding-Hoode, 127.0.0.1, NetworksAgatha, Red Herring, Christie, 007, Public RelationsJohnny, Reds Catcher, Bench, 421, Gaming
'We can tune the query better'
Cried the Owl.
SELECT first, nick, name, phone, group
WHERE nick LIKE 'Red%'
AND group = 'Networking';
But Goldi was too busy after she got the data she needed to listen.
The preceding were obviously flawed queries
• But how do you check if queries are running efficiently?
• What does the query the MySQL server runs really look like? (the dreaded Query Execution Plan). What is cost based optimization?
• How can you make queries faster?
EXPLAIN & EXPLAIN EXTENDED
EXPLAIN [EXTENDED | PARTITIONS]
{
SELECT statement
| DELETE statement
| INSERT statement
| REPLACE statement
| UPDATE statement
}
Or EXPLAIN tbl_name (same as DESCRIBE tbl_name)
What is being EXPLAINed
Prepending EXPLAIN to a statement* asks the optimizer how it would plan to execute that statement (and sometimes it guesses wrong) at lowest cost (measures in disk page seeks*).
What it can tell you:--Where to add INDEXes to speed row access--Check JOIN order
And Optimizer Tracing (more later) has been recently introduced!
* SELECT, DELETE, INSERT, REPLACE & UPDATE as of 5.6, only SELECT 5.5 & previous* Does not know if page is in memory, on disk (storage engine's problem, not optimizer), see
MySQL Manual 7.8.3
The Columns
id Which SELECT
select_type The SELECT type
table Output row table
type JOIN type
possible_keys Potential indexes
key Actual index used
key_ken Length of actual index
ref Columns used against index
rows Estimate of rows
extra Additional Info
A first look at EXPLAIN...using World database
Will read all 4079 rows – all the
rows in this table
EXPLAIN EXTENDED -> query plan
Filtered: Estimated % of rows filteredBy condition
The query as seen by server (kind of, sort of, close)
Add in a WHERE clause
Time for a quick review of indexes
Advantages– Go right to desired
row(s) instead of reading ALL ROWS
– Smaller than whole table (read from disk faster)
– Can 'carry' other data with compound indexes
Disadvantages– Overhead*
• CRUD– Not used on full table
scans
* May need to run ANALYZE TABLE to update statistics such as cardinality to help optimizer make better choices
Quiz: Why read 4079 rows when only five are needed?
Information in the type Column
ALL – full table scan (to be avoided when possible)CONST – WHERE ID=1EQ_REF – WHERE a.ID = b.ID (uses indexes, 1 row returned)REF – WHERE state='CA' (multiple rows for key values)REF_OR_NULL – WHERE ID IS NULL (extra lookup needed for NULL)INDEX_MERGE – WHERE ID = 10 OR state = 'CA'RANGE – WHERE x IN (10,20,30)INDEX – (usually faster when index file < data file)UNIQUE_SUBQUERY – INDEX-SUBQUERY – SYSTEM – Table with 1 row or in-memory table
Full table scans VS Index
So lets create a copy of the World.City table that has no indexes. The optimizer estimates that it would require 4,279 rows to be read to find the desired record – 5% more than actual rows.
And the table has only 4,079 rows.
How does NULL change things?
Taking NOT NULL away from the ID field (plus the previous index) increases the estimated rows read to 4296! Roughly 5.5% more rows than actual in file.
Running ANALYZE TABLE reduces the count to 3816 – still > 1
Both of the following return 1 row
EXPLAIN PARTITIONS -Add 12 hash partitions to City
Some parts of your querymay be hidden!!
Latin1 versus UTF8
Create a copy of the City table but with UTF8 character set replacing Latin1. The three character key_len grows to nine characters. That is more data to read and more to compare which is pronounced 'slower'.
INDEX Length
If a new index on CountryCode with length of 2 bytes, does it work as well as the original 3 bytes?
Forcing use of new shorter index ...
Still generates a guesstimate that 39 rows must be read.
In some cases there is performance to be gained in using shorter indexes.
Subqueries
Run as part of EXPLAIN execution and may cause significant overhead. So be careful when testing.
Note here that #1 is not using an index. And that is why we recommend rewriting sub queries as joins.
EXAMPLE of covering Indexing
In this case, adding an index reduces the reads from 239 to 42.
Can we do better for this query?
Index on both Continent and Government Form
With both Continent and GovernmentForm indexed together, we go from 42 rows read to 19.
Using index means the data is retrieved from index not table (good)
Using index condition means eval pushed down to storage engine. This can reduce storage engine read of table and server reads of storage engine (not bad)
Extra ***
USING INDEX – Getting data from the index rather than the table
USING FILESORT – Sorting was needed rather than using an index. Uses file system (slow)
ORDER BY can use indexesUSING TEMPORARY – A temp table was created –
see tmp_table_size and max_heap_table_sizeUSING WHERE – filter outside storage engineUsing Join Buffer -- means no index used.
Things can get messy!
straight_join forces order of tables
Index Hints
index_hint: USE {INDEX|KEY} [{FOR {JOIN|ORDER BY|
GROUP BY}] ([index_list]) | IGNORE {INDEX|KEY} [{FOR {JOIN|ORDER BY|
GROUP BY}] (index_list) | FORCE {INDEX|KEY} [{FOR {JOIN|ORDER BY|
GROUP BY}] (index_list)
Use only as a last resort – shifts in data can make this the 'long way around'.
http://dev.mysql.com/doc/refman/5.6/en/index-hints.html
Controlling the Optimizer
mysql> SELECT @@optimizer_switch\G*************************** 1. row
***************************@@optimizer_switch:
index_merge=on,index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, engine_condition_pushdown=on, index_condition_pushdown=on, mrr=on,mrr_cost_based=on,
block_nested_loop=on,batched_key_access=off
You can turn on or off certain optimizer settings for GLOBAL or SESSION
See MySQL Manual 7.8.4.2 and know your mileage may vary.
Things to watchmysqladmin -r -i 10 extended-status
Slow_queries – number in last periodSelect_scan – full table scansSelect_full_join full scans to completeCreated_tmp_disk_tables – file sortsKey_read_requerts/Key_wrtie_requests – read/write
weighting of application, may need to modify application
Optimizer Tracing (6.5.3 onward)
SET optimizer_trace="enabled=on";SELECT Name FROM City WHERE ID=999;SELECT trace into dumpfile '/tmp/foo' FROM
INFORMATION_SCHEMA.OPTIMIZER_TRACE;
Shows more logic than EXPLAIN
The output shows much deeper detail on how the optimizer chooses to process a query. This level of detail is well past the level for this presentation.
Sample from the trace – but no clues on optimizing for Joe Average DBA
Final Thoughts
1. READ chapter 7 of the MySQL Manual2. Run ANALYZE TABLE periodically3. Minimize disk I/o
Q&A