Friday, July 18, 2014

MySQL Webinar: MySQL EXPLAIN, explained

Some time ago, Matt Lord and I delivered a webinar on the MySQL EXPLAIN feature.  This webinar is available for on-demand access here.  Based on the questions we got during the webinar, I want to emphasize that EXPLAIN does not execute the query, it only determines the query plan for the query.  Hence, EXPLAIN will not be able to evaluate how good the chosen query plan actually is.

If you have questions on this topic after listening to this webinar, feel free to ask questions; either as comments on this blog or at the MySQL Optimizer Forum.

You can also access other webinars on MySQL. New webinars will be announced here.

Monday, July 14, 2014

A Multi-Table Trick to Speed up Single-Table UPDATE/DELETE Statements

This post appeared first on mysqlserverteam.com

In MySQL, query optimization of single-table UPDATE/DELETE statements is more limited than for SELECT statements. I guess the main reason for this is to limit the optimizer overhead for very simple statements. However, this also means that optimization opportunities are sometimes missed for more complex UPDATE/DELETE statements.

Example

Using the DBT-3 database, the following SQL statement will increase prices by 10% on parts from suppliers in the specified country:
UPDATE part
SET p_retailprice = p_retailprice*1.10
WHERE p_partkey IN
     (SELECT ps_partkey
      FROM partsupp JOIN supplier
      ON ps_suppkey = s_suppkey
      WHERE s_nationkey = 4);

Visual EXPLAIN in MySQL Workbench shows that the optimizer will choose the following execution plan for this UPDATE statement:

update-subquery

That is, for every row in the part table, MySQL will check if this part is supplied by a supplier of the requested nationality. Consider the following similar SELECT statement:
SELECT * FROM part 
WHERE p_partkey IN 
     (SELECT ps_partkey 
      FROM partsupp JOIN supplier 
      ON ps_suppkey = s_suppkey 
      WHERE s_nationkey = 4);

In MySQL 5.6, the query optimizer will apply semi-join transformation to this query. Hence, the execution plan is quite different from the similar UPDATE statement:

select-semijoin

As you can see, there is no sub-query in this plan. The query has been transformed into a three-way join. The great advantage of this semi-join transformation is that the optimizer is now free to re-arrange the order of the tables to be joined. Instead of having to go through all 179,000 parts, it will now start with the estimated 414 suppliers from the given country and find all parts supplied by them. This is obviously more efficient, and it would be good if MySQL would use the same approach for the UPDATE statement.

The Multi-Table Trick

Unlike single-table UPDATE statements, the MySQL Optimizer will use all available optimizations for multi-table UPDATE statements. This means that by rewriting the query as follows, the semi-join optimizations will apply:
UPDATE part, (SELECT 1) dummy 
SET p_retailprice = p_retailprice*1.10
WHERE p_partkey IN 
     (SELECT ps_partkey 
      FROM partsupp JOIN supplier 
      ON ps_suppkey = s_suppkey 
      WHERE s_nationkey = 4);

Notice the extra dummy table in the first line. Here is what happens when I execute the single-table and multi-table variants on a DBT-3 database (scale factor 1):

mysql> UPDATE part SET p_retailprice = p_retailprice*1.10 WHERE p_partkey IN (SELECT ps_partkey FROM partsupp JOIN supplier ON ps_suppkey = s_suppkey WHERE s_nationkey = 4);
Query OK, 31097 rows affected, 28003 warnings (2.63 sec)
Rows matched: 31097  Changed: 31097  Warnings: 28003

mysql> ROLLBACK;
Query OK, 0 rows affected (0.20 sec)

mysql> UPDATE part, (SELECT 1) dummy SET p_retailprice = p_retailprice*1.10 WHERE p_partkey IN (SELECT ps_partkey FROM partsupp JOIN supplier ON ps_suppkey = s_suppkey WHERE s_nationkey = 4); 
Query OK, 31097 rows affected, 28003 warnings (0.40 sec)
Rows matched: 31097  Changed: 31097  Warnings: 28003

As you can see, execution time is reduced from 2.63 seconds to 0.40 seconds by using this trick. (I had executed both statements several times before, so the reported execution times are for a steady state with all accessed data in memory.)

Multi-Table DELETE

The same trick can be used for DELETE statements. Instead of the single-table variant,
DELETE FROM part
WHERE p_partkey IN
       (SELECT ps_partkey
        FROM partsupp JOIN supplier
        ON ps_suppkey = s_suppkey
        WHERE s_nationkey = 4);

you can use the equivalent multi-table variant:

DELETE part FROM part
WHERE p_partkey IN
       (SELECT ps_partkey
        FROM partsupp JOIN supplier
        ON ps_suppkey = s_suppkey
        WHERE s_nationkey = 4);

This rewrite gives a similar performance improvement as reported for the above UPDATE statement.