Friday, November 24, 2017

Going Beyond Tabular EXPLAIN

A while ago, Lukas Eder posted a very interesting article on query optimizations that do not depend on the cost model. We often call such optimizations query transformations since they can be applied by rewriting the query.

In his blog post, Lukas investigated how different database systems handle different opportunities for query transformations. For MySQL, he complained that in some cases, the output from EXPLAIN is not sufficient to tell what is going on. However, as I will show in this blog post, there are other ways to get the information that he was looking for.

The EXPLAIN Warning

What may easily be overlooked when executing EXPLAIN for a query, is that it literally comes with a warning. This warning shows the query after the query transformations have been applied. Let's look at the query Lukas used to explore Transitive Closure:

SELECT first_name, last_name, film_id
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
WHERE a.actor_id = 1;

If we run EXPLAIN for this query, and display the associated warning, we see:

mysql> EXPLAIN SELECT first_name, last_name, film_id FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id WHERE a.actor_id = 1;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | a     | NULL       | const | PRIMARY       | PRIMARY | 2       | const |    1 |   100.00 | NULL        |
|  1 | SIMPLE      | fa    | NULL       | ref   | PRIMARY       | PRIMARY | 2       | const |   19 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
2 rows in set, 1 warning (0,00 sec)

mysql> show warnings;
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                       |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select 'PENELOPE' AS `first_name`,'GUINESS' AS `last_name`,`sakila`.`fa`.`film_id` AS `film_id` from `sakila`.`actor` `a` join `sakila`.`film_actor` `fa` where (`sakila`.`fa`.`actor_id` = 1) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)

If you scroll to the right, you will see that the warning contains: `sakila`.`fa`.`actor_id` = 1. In other words, the predicate actor_id = 1 has been applied to the film_actor table because of transitive closure.

The above warning message also illustrates another aspect of MySQL query optimization. Since the query specifies the value for the primary key of the actor table, the primary key look-up will be done in the optimizer phase. Hence, the warning shows that columns from the actor table have been replaced by the column values for the requested row.

Structured EXPLAIN

MySQL 5.6 introduced Structured EXPLAIN. Its output describes the query plan in JSON format. This output contains additional information compared to traditional EXPLAIN. For example, while the tabular EXPLAIN only says "Using where" when a condition is applied when reading a table, the JSON output will display the actual condition. Let's look at the output for the first example on Removing “Silly” Predicates:

mysql> EXPLAIN FORMAT=JSON SELECT * FROM film WHERE release_year = release_year;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "212.00"
    },
    "table": {
      "table_name": "film",
      "access_type": "ALL",
      "rows_examined_per_scan": 1000,
      "rows_produced_per_join": 100,
      "filtered": "10.00",
      "cost_info": {
        "read_cost": "192.00",
        "eval_cost": "20.00",
        "prefix_cost": "212.00",
        "data_read_per_join": "78K"
      },
      "used_columns": [
        "film_id",
        "title",
        "description",
        "release_year",
        "language_id",
        "original_language_id",
        "rental_duration",
        "rental_rate",
        "length",
        "replacement_cost",
        "rating",
        "special_features",
        "last_update"
      ],
      "attached_condition": "(`sakila`.`film`.`release_year` = `sakila`.`film`.`release_year`)"
    }
  }
} |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The value for "attached_condition" shows that MySQL does not simplify the condition release_year = release_year to release_year IS NOT NULL. So Lukas is right in his educated guess that MySQL does not optimize this. However, if we look at a similar silly predicate on a NOT NULL column, we see that such predicates are eliminated by MySQL:

mysql> EXPLAIN FORMAT=JSON SELECT * FROM film WHERE film_id = film_id;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "212.00"
    },
    "table": {
      "table_name": "film",
      "access_type": "ALL",
      "rows_examined_per_scan": 1000,
      "rows_produced_per_join": 1000,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "12.00",
        "eval_cost": "200.00",
        "prefix_cost": "212.00",
        "data_read_per_join": "781K"
      },
      "used_columns": [
        "film_id",
        "title",
        "description",
        "release_year",
        "language_id",
        "original_language_id",
        "rental_duration",
        "rental_rate",
        "length",
        "replacement_cost",
        "rating",
        "special_features",
        "last_update"
      ]
    }
  }
} |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

In this case, there is no "attached_condition" in the JSON output. In other words, the silly predicate has been removed.

Optimizer Trace

Predicate Merging investigates whether the optimizer will merge two predicates on the same column. There are actually two different aspects here:

  1. Whether predicates are merged and evaluated as a single predicate
  2. Whether key ranges as specified by the query are merged when setting up index range scans
The latter is the most important since it ensures that not more rows than necessary are accessed. Lukas concludes that MySQL does predicate merging based on the estimated number of rows that tabular EXPLAIN shows:
mysql> EXPLAIN SELECT * 
    -> FROM actor
    -> WHERE actor_id IN (2, 3, 4)
    -> AND actor_id IN (1, 2, 3);
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | actor | NULL       | range | PRIMARY       | PRIMARY | 2       | NULL |    2 |   100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0,00 sec)
Here EXPLAIN shows that the estimated number of rows to be read is 2, and this fits with how many rows satisfy the merged predicates.

If we look at the output from structured EXPLAIN, we get a different picture:

mysql> EXPLAIN FORMAT=JSON SELECT *  FROM actor WHERE actor_id IN (2, 3, 4) AND actor_id IN (1, 2, 3);
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "2.81"
    },
    "table": {
      "table_name": "actor",
      "access_type": "range",
      "possible_keys": [
        "PRIMARY"
      ],
      "key": "PRIMARY",
      "used_key_parts": [
        "actor_id"
      ],
      "key_length": "2",
      "rows_examined_per_scan": 2,
      "rows_produced_per_join": 2,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "2.41",
        "eval_cost": "0.40",
        "prefix_cost": "2.81",
        "data_read_per_join": "560"
      },
      "used_columns": [
        "actor_id",
        "first_name",
        "last_name",
        "last_update"
      ],
      "attached_condition": "((`sakila`.`actor`.`actor_id` in (2,3,4)) and (`sakila`.`actor`.`actor_id` in (1,2,3)))"
    }
  }
} |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
As you can see from "attached_condition", the predicates are not actually merged. It is only the key ranges that are merged. Structured EXPLAIN does not show what key ranges are used for the index range scan. However, we can use Optimizer Trace to find this information. (See instructions on how to obtain the optimizer trace.) The optimizer trace for the above query contains this part:
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "PRIMARY",
                      "rows": 2,
                      "ranges": [
                        "2 <= actor_id <= 2",
                        "3 <= actor_id <= 3"
                      ]
                    },
                    "rows_for_plan": 2,
                    "cost_for_plan": 2.41,
                    "chosen": true
                  }
This shows that the range optimizer has actually set up two ranges here; one for actor_id = 2 and one for actor_id = 3. (In other words, the range optimizer does not seem to take into account that actor_id is an integer column.)

It is the same story for Lukas' other query investigating predicate merging. That query specifies two overlapping key ranges:

SELECT *
FROM film
WHERE film_id BETWEEN 1 AND 100
AND film_id BETWEEN 99 AND 200;

Also in this case structured EXPLAIN shows that predicates are not merged, while optimizer trace shows that the key ranges are merged:

                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "PRIMARY",
                      "rows": 2,
                      "ranges": [
                        "99 <= film_id <= 100"
                      ]
                    },
                    "rows_for_plan": 2,
                    "cost_for_plan": 2.41,
                    "chosen": true
                  }
Since this query has range predicates instead of equality predicates, MySQL will here set up a single range scan from 99 to 100.

Concluding Remarks

I have in this blog post shown some examples of how you can get additional information about a query plan, beyond what you can see in the output of tabular EXPLAIN. There is a lot of other information that you can deduct from looking at the EXPLAIN Warning, Structured EXPLAIN, or Optimizer Trace than what I have discussed here. That can be the topic for later blog posts.