Wednesday, February 25, 2015

MySQL Webinar: Analyze & Tune Queries for Better Performance

Thanks to everyone who attended my webinar today, and thanks for all the positive feedback in the Q&A session.  Unfortunately, I was not able to respond to everyone during the 15 minutes available for Q&A.  If your question did not get answered, feel free to use the comments section of this blog post to ask your question.  You can also ask questions on the MySQL Optimizer Forum.

The slides from the presentation are available here.  In a few days, I expect it to be possible to access it as an MySQL On-Demand Webinar.

Wednesday, February 18, 2015

MySQL Webinar: Analyze & Tune Queries for Better Performance

On Wednesday, February 25 at 18:00 CET (9 am Pacific Time), I will do webinar on how to analyze and tune MySQL queries for better performance.

The webinar covers how the MySQL optimizer chooses a specific plan to execute SQL queries. I will also show you how to use tools such as EXPLAIN (including the new JSON-based output) and Optimizer Trace to analyze query plans. We will also review how the Visual Explain functionality available in MySQL Workbench helps us visualize these plans. The webinar will also contain several examples of how to take advantage of the query analysis to improve performance of MySQL queries.

The presentation will be approximately 60 minutes long followed by Q&A.

For details on how to register for the webinar visit http://www.mysql.com/news-and-events/web-seminars/analyze-tune-queries-for-better-performance/

Monday, September 29, 2014

Speaking at MySQL Central @ Open World

I am speaking at two sessions at Oracle Open World this week; one tutorial and one hands-on-lab:
I can also recommend the other sessions by members of the MySQL Optimizer Team:
Looking forward to meeting and talking with you at Oracle Open World!

Friday, July 18, 2014

MySQL Webinar: MySQL EXPLAIN, explained

Some time ago, Matt Lord and I delivered a webinar on the MySQL EXPLAIN feature.  This webinar is available for on-demand access here.  Based on the questions we got during the webinar, I want to emphasize that EXPLAIN does not execute the query, it only determines the query plan for the query.  Hence, EXPLAIN will not be able to evaluate how good the chosen query plan actually is.

If you have questions on this topic after listening to this webinar, feel free to ask questions; either as comments on this blog or at the MySQL Optimizer Forum.

You can also access other webinars on MySQL. New webinars will be announced here.

Monday, July 14, 2014

A Multi-Table Trick to Speed up Single-Table UPDATE/DELETE Statements

This post appeared first on mysqlserverteam.com

In MySQL, query optimization of single-table UPDATE/DELETE statements is more limited than for SELECT statements. I guess the main reason for this is to limit the optimizer overhead for very simple statements. However, this also means that optimization opportunities are sometimes missed for more complex UPDATE/DELETE statements.

Example

Using the DBT-3 database, the following SQL statement will increase prices by 10% on parts from suppliers in the specified country:
UPDATE part
SET p_retailprice = p_retailprice*1.10
WHERE p_partkey IN
     (SELECT ps_partkey
      FROM partsupp JOIN supplier
      ON ps_suppkey = s_suppkey
      WHERE s_nationkey = 4);

Visual EXPLAIN in MySQL Workbench shows that the optimizer will choose the following execution plan for this UPDATE statement:

update-subquery

That is, for every row in the part table, MySQL will check if this part is supplied by a supplier of the requested nationality. Consider the following similar SELECT statement:
SELECT * FROM part 
WHERE p_partkey IN 
     (SELECT ps_partkey 
      FROM partsupp JOIN supplier 
      ON ps_suppkey = s_suppkey 
      WHERE s_nationkey = 4);

In MySQL 5.6, the query optimizer will apply semi-join transformation to this query. Hence, the execution plan is quite different from the similar UPDATE statement:

select-semijoin

As you can see, there is no sub-query in this plan. The query has been transformed into a three-way join. The great advantage of this semi-join transformation is that the optimizer is now free to re-arrange the order of the tables to be joined. Instead of having to go through all 179,000 parts, it will now start with the estimated 414 suppliers from the given country and find all parts supplied by them. This is obviously more efficient, and it would be good if MySQL would use the same approach for the UPDATE statement.

The Multi-Table Trick

Unlike single-table UPDATE statements, the MySQL Optimizer will use all available optimizations for multi-table UPDATE statements. This means that by rewriting the query as follows, the semi-join optimizations will apply:
UPDATE part, (SELECT 1) dummy 
SET p_retailprice = p_retailprice*1.10
WHERE p_partkey IN 
     (SELECT ps_partkey 
      FROM partsupp JOIN supplier 
      ON ps_suppkey = s_suppkey 
      WHERE s_nationkey = 4);

Notice the extra dummy table in the first line. Here is what happens when I execute the single-table and multi-table variants on a DBT-3 database (scale factor 1):

mysql> UPDATE part SET p_retailprice = p_retailprice*1.10 WHERE p_partkey IN (SELECT ps_partkey FROM partsupp JOIN supplier ON ps_suppkey = s_suppkey WHERE s_nationkey = 4);
Query OK, 31097 rows affected, 28003 warnings (2.63 sec)
Rows matched: 31097  Changed: 31097  Warnings: 28003

mysql> ROLLBACK;
Query OK, 0 rows affected (0.20 sec)

mysql> UPDATE part, (SELECT 1) dummy SET p_retailprice = p_retailprice*1.10 WHERE p_partkey IN (SELECT ps_partkey FROM partsupp JOIN supplier ON ps_suppkey = s_suppkey WHERE s_nationkey = 4); 
Query OK, 31097 rows affected, 28003 warnings (0.40 sec)
Rows matched: 31097  Changed: 31097  Warnings: 28003

As you can see, execution time is reduced from 2.63 seconds to 0.40 seconds by using this trick. (I had executed both statements several times before, so the reported execution times are for a steady state with all accessed data in memory.)

Multi-Table DELETE

The same trick can be used for DELETE statements. Instead of the single-table variant,
DELETE FROM part
WHERE p_partkey IN
       (SELECT ps_partkey
        FROM partsupp JOIN supplier
        ON ps_suppkey = s_suppkey
        WHERE s_nationkey = 4);

you can use the equivalent multi-table variant:

DELETE part FROM part
WHERE p_partkey IN
       (SELECT ps_partkey
        FROM partsupp JOIN supplier
        ON ps_suppkey = s_suppkey
        WHERE s_nationkey = 4);

This rewrite gives a similar performance improvement as reported for the above UPDATE statement.

Thursday, May 2, 2013

When is a Subquery Executed?

In an earlier blog post, I managed to confuse myself as to when a subquery was executed. It is not very clear from the output of EXPLAIN where the execution of a subquery takes place. Let's take a look at the following example query (Query 17 in the DBT-3 benchmark):

select sum(l_extendedprice) / 7.0 as avg_yearly
from lineitem, part
where p_partkey = l_partkey
  and p_brand = 'Brand#33' and p_container = 'LG CAN'
  and l_quantity < (
 select 0.2 * avg(l_quantity)
 from lineitem
 where l_partkey = p_partkey
);

If you run EXPLAIN on this query, you will see the following execution plan:

+----+--------------------+----------+------+---------------------------------+---------------------+---------+---------------------+---------+-------------+
| id | select_type        | table    | type | possible_keys                   | key                 | key_len | ref                 | rows    | Extra       |
+----+--------------------+----------+------+---------------------------------+---------------------+---------+---------------------+---------+-------------+
|  1 | PRIMARY            | part     | ALL  | PRIMARY                         | NULL                | NULL    | NULL                | 2000000 | Using where |
|  1 | PRIMARY            | lineitem | ref  | i_l_suppkey_partkey,i_l_partkey | i_l_suppkey_partkey | 5       | dbt3.part.p_partkey |      14 | Using where |
|  2 | DEPENDENT SUBQUERY | lineitem | ref  | i_l_suppkey_partkey,i_l_partkey | i_l_suppkey_partkey | 5       | dbt3.part.p_partkey |      14 | NULL        |
+----+--------------------+----------+------+---------------------------------+---------------------+---------+---------------------+---------+-------------+

It is not straightforward to determine from this plan when the subquery will be executed. We see that both tables of the join has "Using where" in the Extra column. In other words, the original WHERE clause has been split between the two tables. The question is: Which part contains the subquery?

In this specific case, the answer is pretty evident when looking closer at the query. The subquery is part of an expression which refers to a column of the lineitem table, l_quantity, and within the subquery, there is also also a reference to p_partkey of the part table. In other words, columns from both tables of the join is needed when executing the subqquery. Hence, the subquery will be part of the where clause attached to the last table of the join.

This means that the above query will, for each row in the part table that satisfies the condition attached to the part table, look up matching rows in the lineitem table, and, for every matching row, execute the subquery. As you probably realize, one will have to know quite a bit about how the optimizer handles WHERE conditions in order to deduce this. But don't despair! An extension to EXPLAIN in MySQL 5.6 make things much clearer.

Structured EXPLAIN

With MySQL 5.6, you can get a structured version of the query plan in JSON format. All you need to do is to write "EXPLAIN FORMAT=JSON" instead of just EXPLAIN. Let's take a look at what the JSON EXPLAIN output for the query discussed above looks like:

{
  "query_block": {
    "select_id": 1,
    "nested_loop": [
      {
        "table": {
          "table_name": "part",
          "access_type": "ALL",
          "possible_keys": [
            "PRIMARY"
          ] /* possible_keys */,
          "rows": 2000000,
          "filtered": 100,
          "attached_condition": "((`dbt3`.`part`.`p_container` = 'LG CAN') and (`dbt3`.`part`.`p_brand` = 'Brand#33'))"
        } /* table */
      },
      {
        "table": {
          "table_name": "lineitem",
          "access_type": "ref",
          "possible_keys": [
            "i_l_suppkey_partkey",
            "i_l_partkey"
          ] /* possible_keys */,
          "key": "i_l_suppkey_partkey",
          "used_key_parts": [
            "l_partkey"
          ] /* used_key_parts */,
          "key_length": "5",
          "ref": [
            "dbt3.part.p_partkey"
          ] /* ref */,
          "rows": 14,
          "filtered": 100,
          "attached_condition": "(`dbt3`.`lineitem`.`l_quantity` < (/* select#2 */ select (0.2 * avg(`dbt3`.`lineitem`.`l_quantity`)) from `dbt3`.`lineitem` where (`dbt3`.`lineitem`.`l_partkey` = `dbt3`.`part`.`p_partkey`)))",
          "attached_subqueries": [
            {
              "dependent": true,
              "cacheable": false,
              "query_block": {
                "select_id": 2,
                "table": {
                  "table_name": "lineitem",
                  "access_type": "ref",
                  "possible_keys": [
                    "i_l_suppkey_partkey",
                    "i_l_partkey"
                  ] /* possible_keys */,
                  "key": "i_l_suppkey_partkey",
                  "used_key_parts": [
                    "l_partkey"
                  ] /* used_key_parts */,
                  "key_length": "5",
                  "ref": [
                    "dbt3.part.p_partkey"
                  ] /* ref */,
                  "rows": 14,
                  "filtered": 100
                } /* table */
              } /* query_block */
            }
          ] /* attached_subqueries */
        } /* table */
      }
    ] /* nested_loop */
  } /* query_block */
}

You probably recognize much of the information from traditional EXPLAIN, but there is also some information that traditional EXPLAIN do not show. The "attached_condition" field shows which parts of the WHERE condition is evaluated at which stage of the query execution. That is, the condition attached to a table will be evaluated for each row that is read from this table. In this specific example, it shows that the subquery is part of the condition attached to the lineitem table, and the "attached_subqueries" field for that table contains the execution plan for the subquery.

Visual EXPLAIN

Structured EXPLAIN, or EXPLAIN in JSON format, is also much more suitable for tools that want to process and display query plans. MySQL Workbench 5.2 uses structural EXPLAIN to visualize MySQL query plans. Below is shown the result of Visual EXPLAIN for the example query used in this blog post:

Note that MySQL Workbench colors the table boxes based on which access method is used. A red box indicates a tables scan, while a green box is used for index look-ups (ref access).

As we all know: "A picture is worth a thousand words", and from the above picture it should be pretty clear when the subquery is executed.

Friday, February 22, 2013

When and How to Take Advantage of New Optimizer Features in MySQL 5.6

A few weeks ago, I made a presentation with the above title at FOSDEM in Brussels. My slides can be found here. The MySQL and Friends devroom had many interesting presentations and the room was full for most of them.