This post appeared first on mysqlserverteam.com
In MySQL, query optimization of single-table UPDATE/DELETE statements is more limited than for SELECT statements. I guess the main reason for this is to limit the optimizer overhead for very simple statements. However, this also means that optimization opportunities are sometimes missed for more complex UPDATE/DELETE statements.
Example
Using the
DBT-3 database, the following SQL statement will increase prices by 10% on parts from suppliers in the specified country:
UPDATE part
SET p_retailprice = p_retailprice*1.10
WHERE p_partkey IN
(SELECT ps_partkey
FROM partsupp JOIN supplier
ON ps_suppkey = s_suppkey
WHERE s_nationkey = 4);
Visual EXPLAIN in MySQL Workbench shows that the optimizer will choose the following execution plan for this UPDATE statement:
That is, for every row in the
part table, MySQL will check if this part is supplied by a supplier of the requested nationality.
Consider the following similar SELECT statement:
SELECT * FROM part
WHERE p_partkey IN
(SELECT ps_partkey
FROM partsupp JOIN supplier
ON ps_suppkey = s_suppkey
WHERE s_nationkey = 4);
In MySQL 5.6, the query optimizer will apply semi-join transformation to this query. Hence, the execution plan is quite different from the similar UPDATE statement:
As you can see, there is no sub-query in this plan. The query has been transformed into a three-way join. The great advantage of this semi-join transformation is that the optimizer is now free to re-arrange the order of the tables to be joined. Instead of having to go through all 179,000 parts, it will now start with the estimated 414 suppliers from the given country and find all parts supplied by them. This is obviously more efficient, and it would be good if MySQL would use the same approach for the UPDATE statement.
The Multi-Table Trick
Unlike single-table UPDATE statements, the MySQL Optimizer will use all available optimizations for multi-table
UPDATE statements. This means that by rewriting the query as follows, the semi-join optimizations will apply:
UPDATE part, (SELECT 1) dummy
SET p_retailprice = p_retailprice*1.10
WHERE p_partkey IN
(SELECT ps_partkey
FROM partsupp JOIN supplier
ON ps_suppkey = s_suppkey
WHERE s_nationkey = 4);
Notice the extra dummy table in the first line. Here is what happens when I execute the single-table and multi-table variants on a DBT-3 database (scale factor 1):
mysql> UPDATE part SET p_retailprice = p_retailprice*1.10 WHERE p_partkey IN (SELECT ps_partkey FROM partsupp JOIN supplier ON ps_suppkey = s_suppkey WHERE s_nationkey = 4);
Query OK, 31097 rows affected, 28003 warnings (2.63 sec)
Rows matched: 31097 Changed: 31097 Warnings: 28003
mysql> ROLLBACK;
Query OK, 0 rows affected (0.20 sec)
mysql> UPDATE part, (SELECT 1) dummy SET p_retailprice = p_retailprice*1.10 WHERE p_partkey IN (SELECT ps_partkey FROM partsupp JOIN supplier ON ps_suppkey = s_suppkey WHERE s_nationkey = 4);
Query OK, 31097 rows affected, 28003 warnings (0.40 sec)
Rows matched: 31097 Changed: 31097 Warnings: 28003
As you can see, execution time is reduced from 2.63 seconds to 0.40 seconds by using this trick. (I had executed both statements several times before, so the reported execution times are for a steady state with all accessed data in memory.)
Multi-Table DELETE
The same trick can be used for
DELETE statements. Instead of the single-table variant,
DELETE FROM part
WHERE p_partkey IN
(SELECT ps_partkey
FROM partsupp JOIN supplier
ON ps_suppkey = s_suppkey
WHERE s_nationkey = 4);
you can use the equivalent multi-table variant:
DELETE part FROM part
WHERE p_partkey IN
(SELECT ps_partkey
FROM partsupp JOIN supplier
ON ps_suppkey = s_suppkey
WHERE s_nationkey = 4);
This rewrite gives a similar performance improvement as reported for the above UPDATE statement.