If you have questions on this topic after listening to this webinar, feel free to ask questions; either as comments on this blog or at the MySQL Optimizer Forum.
You can also access other webinars on MySQL. New webinars will be announced here.
This post appeared first on mysqlserverteam.com
In MySQL, query optimization of single-table UPDATE/DELETE statements is more limited than for SELECT statements. I guess the main reason for this is to limit the optimizer overhead for very simple statements. However, this also means that optimization opportunities are sometimes missed for more complex UPDATE/DELETE statements.UPDATE part
SET p_retailprice = p_retailprice*1.10
WHERE p_partkey IN
(SELECT ps_partkey
FROM partsupp JOIN supplier
ON ps_suppkey = s_suppkey
WHERE s_nationkey = 4);
Visual EXPLAIN in MySQL Workbench shows that the optimizer will choose the following execution plan for this UPDATE statement:
That is, for every row in the part table, MySQL will check if this part is supplied by a supplier of the requested nationality. Consider the following similar SELECT statement:SELECT * FROM part
WHERE p_partkey IN
(SELECT ps_partkey
FROM partsupp JOIN supplier
ON ps_suppkey = s_suppkey
WHERE s_nationkey = 4);
In MySQL 5.6, the query optimizer will apply semi-join transformation to this query. Hence, the execution plan is quite different from the similar UPDATE statement:
As you can see, there is no sub-query in this plan. The query has been transformed into a three-way join. The great advantage of this semi-join transformation is that the optimizer is now free to re-arrange the order of the tables to be joined. Instead of having to go through all 179,000 parts, it will now start with the estimated 414 suppliers from the given country and find all parts supplied by them. This is obviously more efficient, and it would be good if MySQL would use the same approach for the UPDATE statement.UPDATE part, (SELECT 1) dummy
SET p_retailprice = p_retailprice*1.10
WHERE p_partkey IN
(SELECT ps_partkey
FROM partsupp JOIN supplier
ON ps_suppkey = s_suppkey
WHERE s_nationkey = 4);
Notice the extra dummy table in the first line. Here is what happens when I execute the single-table and multi-table variants on a DBT-3 database (scale factor 1):
mysql> UPDATE part SET p_retailprice = p_retailprice*1.10 WHERE p_partkey IN (SELECT ps_partkey FROM partsupp JOIN supplier ON ps_suppkey = s_suppkey WHERE s_nationkey = 4);
Query OK, 31097 rows affected, 28003 warnings (2.63 sec)
Rows matched: 31097 Changed: 31097 Warnings: 28003
mysql> ROLLBACK;
Query OK, 0 rows affected (0.20 sec)
mysql> UPDATE part, (SELECT 1) dummy SET p_retailprice = p_retailprice*1.10 WHERE p_partkey IN (SELECT ps_partkey FROM partsupp JOIN supplier ON ps_suppkey = s_suppkey WHERE s_nationkey = 4);
Query OK, 31097 rows affected, 28003 warnings (0.40 sec)
Rows matched: 31097 Changed: 31097 Warnings: 28003
As you can see, execution time is reduced from 2.63 seconds to 0.40 seconds by using this trick. (I had executed both statements several times before, so the reported execution times are for a steady state with all accessed data in memory.)
DELETE FROM part
WHERE p_partkey IN
(SELECT ps_partkey
FROM partsupp JOIN supplier
ON ps_suppkey = s_suppkey
WHERE s_nationkey = 4);
you can use the equivalent multi-table variant:
DELETE part FROM part
WHERE p_partkey IN
(SELECT ps_partkey
FROM partsupp JOIN supplier
ON ps_suppkey = s_suppkey
WHERE s_nationkey = 4);
This rewrite gives a similar performance improvement as reported for the above UPDATE statement.