Thursday, May 2, 2013

When is a Subquery Executed?

In an earlier blog post, I managed to confuse myself as to when a subquery was executed. It is not very clear from the output of EXPLAIN where the execution of a subquery takes place. Let's take a look at the following example query (Query 17 in the DBT-3 benchmark):

select sum(l_extendedprice) / 7.0 as avg_yearly
from lineitem, part
where p_partkey = l_partkey
  and p_brand = 'Brand#33' and p_container = 'LG CAN'
  and l_quantity < (
 select 0.2 * avg(l_quantity)
 from lineitem
 where l_partkey = p_partkey
);

If you run EXPLAIN on this query, you will see the following execution plan:

+----+--------------------+----------+------+---------------------------------+---------------------+---------+---------------------+---------+-------------+
| id | select_type        | table    | type | possible_keys                   | key                 | key_len | ref                 | rows    | Extra       |
+----+--------------------+----------+------+---------------------------------+---------------------+---------+---------------------+---------+-------------+
|  1 | PRIMARY            | part     | ALL  | PRIMARY                         | NULL                | NULL    | NULL                | 2000000 | Using where |
|  1 | PRIMARY            | lineitem | ref  | i_l_suppkey_partkey,i_l_partkey | i_l_suppkey_partkey | 5       | dbt3.part.p_partkey |      14 | Using where |
|  2 | DEPENDENT SUBQUERY | lineitem | ref  | i_l_suppkey_partkey,i_l_partkey | i_l_suppkey_partkey | 5       | dbt3.part.p_partkey |      14 | NULL        |
+----+--------------------+----------+------+---------------------------------+---------------------+---------+---------------------+---------+-------------+

It is not straightforward to determine from this plan when the subquery will be executed. We see that both tables of the join has "Using where" in the Extra column. In other words, the original WHERE clause has been split between the two tables. The question is: Which part contains the subquery?

In this specific case, the answer is pretty evident when looking closer at the query. The subquery is part of an expression which refers to a column of the lineitem table, l_quantity, and within the subquery, there is also also a reference to p_partkey of the part table. In other words, columns from both tables of the join is needed when executing the subqquery. Hence, the subquery will be part of the where clause attached to the last table of the join.

This means that the above query will, for each row in the part table that satisfies the condition attached to the part table, look up matching rows in the lineitem table, and, for every matching row, execute the subquery. As you probably realize, one will have to know quite a bit about how the optimizer handles WHERE conditions in order to deduce this. But don't despair! An extension to EXPLAIN in MySQL 5.6 make things much clearer.

Structured EXPLAIN

With MySQL 5.6, you can get a structured version of the query plan in JSON format. All you need to do is to write "EXPLAIN FORMAT=JSON" instead of just EXPLAIN. Let's take a look at what the JSON EXPLAIN output for the query discussed above looks like:

{
  "query_block": {
    "select_id": 1,
    "nested_loop": [
      {
        "table": {
          "table_name": "part",
          "access_type": "ALL",
          "possible_keys": [
            "PRIMARY"
          ] /* possible_keys */,
          "rows": 2000000,
          "filtered": 100,
          "attached_condition": "((`dbt3`.`part`.`p_container` = 'LG CAN') and (`dbt3`.`part`.`p_brand` = 'Brand#33'))"
        } /* table */
      },
      {
        "table": {
          "table_name": "lineitem",
          "access_type": "ref",
          "possible_keys": [
            "i_l_suppkey_partkey",
            "i_l_partkey"
          ] /* possible_keys */,
          "key": "i_l_suppkey_partkey",
          "used_key_parts": [
            "l_partkey"
          ] /* used_key_parts */,
          "key_length": "5",
          "ref": [
            "dbt3.part.p_partkey"
          ] /* ref */,
          "rows": 14,
          "filtered": 100,
          "attached_condition": "(`dbt3`.`lineitem`.`l_quantity` < (/* select#2 */ select (0.2 * avg(`dbt3`.`lineitem`.`l_quantity`)) from `dbt3`.`lineitem` where (`dbt3`.`lineitem`.`l_partkey` = `dbt3`.`part`.`p_partkey`)))",
          "attached_subqueries": [
            {
              "dependent": true,
              "cacheable": false,
              "query_block": {
                "select_id": 2,
                "table": {
                  "table_name": "lineitem",
                  "access_type": "ref",
                  "possible_keys": [
                    "i_l_suppkey_partkey",
                    "i_l_partkey"
                  ] /* possible_keys */,
                  "key": "i_l_suppkey_partkey",
                  "used_key_parts": [
                    "l_partkey"
                  ] /* used_key_parts */,
                  "key_length": "5",
                  "ref": [
                    "dbt3.part.p_partkey"
                  ] /* ref */,
                  "rows": 14,
                  "filtered": 100
                } /* table */
              } /* query_block */
            }
          ] /* attached_subqueries */
        } /* table */
      }
    ] /* nested_loop */
  } /* query_block */
}

You probably recognize much of the information from traditional EXPLAIN, but there is also some information that traditional EXPLAIN do not show. The "attached_condition" field shows which parts of the WHERE condition is evaluated at which stage of the query execution. That is, the condition attached to a table will be evaluated for each row that is read from this table. In this specific example, it shows that the subquery is part of the condition attached to the lineitem table, and the "attached_subqueries" field for that table contains the execution plan for the subquery.

Visual EXPLAIN

Structured EXPLAIN, or EXPLAIN in JSON format, is also much more suitable for tools that want to process and display query plans. MySQL Workbench 5.2 uses structural EXPLAIN to visualize MySQL query plans. Below is shown the result of Visual EXPLAIN for the example query used in this blog post:

Note that MySQL Workbench colors the table boxes based on which access method is used. A red box indicates a tables scan, while a green box is used for index look-ups (ref access).

As we all know: "A picture is worth a thousand words", and from the above picture it should be pretty clear when the subquery is executed.