Friday, March 17, 2017

MySQL 5.7: Improved JOIN Order by Taking Condition Filter Effect into Account

One of the major challenges of query optimizers is to correctly estimate how many rows qualify from each table of a join. If the estimates are wrong, the optimizer may choose a non-optimal join order.

Before MySQL 5.7, the estimated number of rows from a table only took into account the conditions from the WHERE clause that were used to set up the access method (e.g., the size of an index range scan). This often led to row estimates that were far too high, resulting in very wrong cost estimates for join plans. To improve this issue, MySQL 5.7 introduced a cost model that considered the entire WHERE condition when estimating the number of qualifying rows from each table. This model estimates the filtering effect of the table’s conditions.

As shown in the above figure, the condition filter effect will reduce the estimated number of rows from tx that will lead to an index look-up on tx+1. For more details on how condition filtering works, see two earlier blog posts on this topic: part1, part2.

Taking condition filtering into account, the join optimizer will in many cases be able to find a more optimal join order. However, there are cases where the optimizer will overestimate the filtering effect and choose a non-optimal query plan. In this blog post, I will show an example of a query that benefits from condition filtering, and in a follow-up blog post I will discuss what could be done if condition filtering does not have the desired effect.

Example: DBT-3 Query 8

To show the benefits of condition filtering, we will look at Query 8 in the DBT-3 benchmark:

SELECT o_year,
       SUM(CASE WHEN nation = 'FRANCE' THEN volume ELSE 0 END) / SUM(volume) AS mkt_share
FROM (
    SELECT EXTRACT(YEAR FROM o_orderdate) AS o_year,
           l_extendedprice * (1 - l_discount) AS volume, n2.n_name AS nation
    FROM part
    JOIN lineitem ON p_partkey = l_partkey
    JOIN supplier ON s_suppkey = l_suppkey
    JOIN orders ON l_orderkey = o_orderkey
    JOIN customer ON o_custkey = c_custkey
    JOIN nation n1 ON c_nationkey = n1.n_nationkey
    JOIN region ON  n1.n_regionkey = r_regionkey   
    JOIN nation n2 ON s_nationkey = n2.n_nationkey
    WHERE r_name = 'EUROPE' AND o_orderdate BETWEEN '1995-01-01' AND '1996-12-31'
      AND p_type = 'PROMO BRUSHED STEEL'
) AS all_nations GROUP BY o_year ORDER BY o_year;

Query 8 is called National Market Share Query, and it finds the market share in Europe for French suppliers of a given part type. You do not need to understand this query in detail. The main point is that 8 tables are joined, and that it is important to find an efficient join order for the query to perform well.

In MySQL 5.6, Visual EXPLAIN shows the following query plan for Query 8:

Tables are joined from left-to-right, starting with a full table scan of the region table, doing index look-ups into all other tables, with the part table as the last table. The execution time for this query is around 6 seconds on a scale factor 1 DBT-3 database.

If we look at the WHERE clause of the query, we see that there is one condition with high selectivity: We are interested in just one particular of many possible part types. That the region should be Europe, and that the time period is restricted to two years, will still leave many candidate rows, so these conditions have low selectivity. In other words, a query plan that put the part table at the end is not optimal. If part was processed early, we would be able to filter out many rows, and the number of index look-ups into other tables could be significantly reduced.

In MySQL 5.7, the use of condition filtering estimates changes the query plan for Query 8, and the new query plan looks as follows:

As you see, part is now processed early. (We see that region is still processed first, but this is a small table with only 5 rows of which only one row matches its condition. Hence, it will not impact the fan-out of the join.) This new query plan takes 0.5 seconds to execute. In other words, execution time is reduced by 92% by changing the join order! The main saving is that, instead of going through all orders for customers in Europe, one will only look at orders of parts of a given type.

(The observant reader will have noticed another difference between the Visual EXPLAIN diagrams: The box around the tables is no longer present in the 5.7 diagram. If you look a Query 8, you notice that the many-table join is in a sub-query in the FROM clause; aka derived table. In MySQL 5.6 and earlier, the result of derived tables where materialized in a temporary table, before the main query was executed. The extra box in the 5.6 diagram, represents such a temporary table. In MySQL 5.7, derived tables will be merged into the outer query if possible, and the materialization of the derived table is avoided.)

To see the optimizer's estimates for condition filter effects, we can look at the filtered column of tabular EXPLAIN (some of the columns have been removed to save space):

id select_type table type key rows filtered Extra
1 SIMPLE region ALL NULL 5 20.00 Using where; Using temporary; Using filesort
1 SIMPLE part ALL NULL 200000 10.00 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE lineitem ref i_l_partkey_suppkey 30 100.00 Using index condition
1 SIMPLE supplier eq_ref PRIMARY 1 100.00 Using where
1 SIMPLE n2 eq_ref PRIMARY 1 100.00 NULL
1 SIMPLE orders eq_ref PRIMARY 1 50.00 Using where
1 SIMPLE customer eq_ref PRIMARY 1 100.00 Using where
1 SIMPLE n1 eq_ref PRIMARY 1 20.00 Using where

We can see that there are 4 tables for which the optimizer assumes some filtering; region, part, orders, and n1 (nation). The optimizer has three sources for its estimates:

  1. Range estimates:

    For indexed columns, the range optimizer will ask the storage engine for an estimate as to how many rows are within a given range. This estimate will be pretty accurate. For our query, this is the case for the region and orders table. Europe is 1 out of 5 regions and the requested two year period contains 50% of the orders in the database.

  2. Index statistics:

    For indexed columns, MySQL also keep statistics on the average number of rows per value (records_per_key). This can be used to estimate the filtering effect of conditions that refers to columns of preceeding tables in the join order. For our query this is the case for n1. The query has two conditions involving n1, but the condition on n_nationkey is used for the index look-up and will not contribute to extra filtering. On the other hand, the condition on n_regionkey will provide some extra filtering, and since records_per_key tells that there are 5 distinct values for this column, the estimated filtering effect will be 20%.

    The assumption is that values are evenly distributed. If the distribution is skewed, the filtering estimate will be less accurate. Another cause of estimation errors is that index statistics are based on sampling, so it may not precisely reflect the actual distribution of values.

  3. Guesstimates:

    For non-indexed columns, MySQL does not have any statistics. The optimizer will then resort to heuristics. For equality conditions, the filtering effect is assumed to be 10%. The DBT-3 database does not have an index on part_type. Hence, the filtering effect for the part table will be set to 10%. This is actually a bit off since there are 150 different parts. However, in this case, just assuming that there is some filtering, made the optimizer choose a better plan.

Caveat

As shown in this blog post, taking condition filtering into account may give better query plans. However, as discussed, there is a risk that the filtering estimate are inaccurate, especially for conditions on non-indexed columns. Another cause of estimation errors is that it is assumed that columns are not correlated. Hence, when here are conditions on correlated columns, the optimizer will overestimate the filtering effect of those conditions.

We have got a few bug reports on performance regressions when upgrading from 5.6 to 5.7 that are caused by the optimizer overestimating the filtering effect. In most cases, this is because the query contains equality conditions on non-indexed columns with low cardinality, so the guesstimate of 10% is too optimistic. In my next blog post, I will discuss what can be done when this happens. We are also working on adding histograms to MySQL. Histograms will give a much better basis for estimating the filtering effects of conditions on non-indexed columns.

Saturday, February 4, 2017

Presentations at pre-FOSDEM'17 MySQL Day

I am currently at FOSDEM in Brussels attending a lot of interesting presentations in the MySQL and Friends devroom.  Yesterday the MySQL community team organized a pre-FOSDEM MySQL Day, where I delivered two talks.  The slides for my talks can be found on Slideshare:

Monday, January 16, 2017

Improving the Stability of MySQL Single-Threaded Benchmarks

I have for some years been running the queries of the DBT-3 benchmark, both to verify the effect of new query optimization features, and to detect any performance regressions that may have been introduced. However, I have had issues with getting stable results. While repeated runs of a query is very stable, I can get quite different results if I restart the server. As an example, I got a coefficient of variation (CoV) of 0.1% for 10 repeated executions of a query on the same server, while the CoV for the average runtime of 10 such experiments was over 6%!

With such large variation in results, significant performance regressions may not be noticed. I have tried a lot of stuff to get more stable runs, and in this blog post I will write about the things that I have found to have positive effects on stability. At the end, I will also list the things I have tried that did not show any positive effects.

Test Enviroment

First, I will describe the setup for the tests I run. All tests are run on a 2-socket box running Oracle Linux 7. The CPUs are Intel Xeon E5-2690 (Sandy Bridge) with 8 physical cores @ 2.90GHz.

I always bind the MySQL server to a single CPU, using taskset or numactl, and Turbo Boost is disabled. The computer has 128 GB of RAM, and the InnoDB buffer pool is big enough to contain the entire database. (4 GB buffer pool for scale factor 1 and 32 GB buffer pool for scale factor 10.)

Each test run is as follows:

  1. Start the server
  2. Run a query 20 times in sequence
  3. Repeat step 2 for all DBT-3 queries
  4. Stop the server

The result for each query will be the average execution times of the last 10 runs. The reason for the long warm-up period is that, from experience, when InnoDB's Adaptive Hash Index is on, you will need 8 runs or so before execution times are stable.

As I wrote above, the variance within each test run is very small, but the difference between test runs can be large. The variance is somewhat improved by picking the best result out of three test runs, but it is still not satisfactory. Also, a full test run on a scale factor 10 database takes 9 hours, so I would like to avoid having to repeat the tests multiple times.

Address Space Layout Randomization

A MySQL colleague mentioned that he had heard about some randomization that was possible to disable. After some googling, I learned about Address Space Layout Randomization (ASLR). This is a security technique that is used to prevent an attacker from being able to determine where code and data are located in the address space of a process. I also found some instructions on stackoverflow for how to disable it on Linux.

Turning off ASLR sure made a difference! Take a look at this graph that shows the average execution time for Query 12 in ten different test runs with and without ASLR (All runs are with a scale factor 1 DBT-3 database on MySQL 8.0.0 DMR):

I will definitely make sure ASLR is disabled in future tests!

Adaptive Hash Index

InnoDB maintains an Adaptive Hash Index (AHI) for frequently accessed pages. The hash index will speed up look-ups on primary key, and is also useful for secondary index accesses since a primary key look-up is needed to get from the index entry to the corresponding row. Some DBT-3 queries run twice as slow if I turn off AHI, so it has definitely some value. However, experience shows that I will have to repeat a query several times before the AHI is actually built for all pages accessed by the query. I plan to write another blog post where I discuss more about AHI.

Until I stumbled across ASLR, turning off AHI was my best bet at stabilizing the results. After disabling ASLR, also turning off AHI only shows a slight improvement in stability. However, there are other reasons for turning off AHI.

I have observed some times that with AHI on, a change of query plan for one query may affect the execution time of subsequent queries. I suspect the reason is that the content of the AHI after a query has been run, may change with a different query plan. Hence, the next query may be affected if it accesses the same data pages.

Turning off AHI also means that I no longer need the long warm-up period for the timing to stabilize. I can then repeat each query 10 times instead of 20 times. This means that even if many of the queries take longer to execute, the total time to execute a test run will be lower.

Because of the above, I have decided to turn off AHI in most of my test runs. However, I will run with AHI on once in a while just to make sure that there are no major regressions in AHI performance.

Preload Data and Indexes

I also tried to start each test run with a set of queries that would sequentially scan all tables and indexes. My thinking was that this could give a more deterministic placement of data in memory. Before I turned off ASLR, preloading had very good effects on the stability when AHI was disabled. With ASLR off, the difference is less significant, but there is still a slight improvement.

Below is a table that shows some results for all combinations of the settings discussed so far. Ten test runs were performed for each combination on a scale factor 1 database. The numbers shown is the average difference between the best and the worst runs over all queries, and the largest difference between the best and the worst runs for a single query.

ASLR AHI Preload Avg(MAX-MIN) Max(MAX-MIN)
6.18% 14.75%
4.65% 14.79%
5.56% 14.65%
2.18% 5.05%
1.66% 3.94%
1.58% 3.58%
1.66% 3.78%
1.09% 3.27%

From the above table it is clear that the most stable runs are achieved by using preloading in combination with disabling both ASLR and AHI.

For one of the DBT-3 queries, using preloading on a scale factor 10 database leads to higher variance within a test run. While the CoV within a test run is below 0.2% for all queries without preloading, query 21 has a CoV of 3% with preloading. I am currently investigating this, and I have indications that the variance can be reduced by setting the memory placement policy to interleave. I guess the reason is that with a 32 GB InnoDB buffer pool, one will not be able to allocate all memory locally to the CPU where the server is running.

What Did Not Have an Effect?

Here is a list of things I have tried that did not seem to have a positive effect on the stability of my results:

  • Different governors for CPU frequency scaling. I have chosen the performance governor, because it "sounds good", but I did not see any difference when using the powersave governor instead. I also tried turning off the Intel pstate driver, but did not notice any difference in that case either.
  • Bind the MySQL server to a single core or thread (instead of CPU).
  • Bind the MySQL client to a single CPU.
  • Different settings for NUMA memory placement policy using numactl. (With the possible exception of using interleave for scale factor 10 as mentioned above.)
  • Different memory allocation libraries (jemalloc, tcmalloc). jemalloc actually seemed to increase the instability.
  • Disable InnoDB buffer pool preloading: innodb_buffer_pool_load_at_startup = off
  • Set innodb_old_blocks_time = 0

Conclusion

My tests have shown that I get better stability of test results if I disable both ASLR and AHI, and that combining this with preloading of all tables and indexes in many cases will further improve the stability of my test setup. (Note that I do not recommend to disable ASLR in a production environment. That could make you more vulnerable to attacks.)

I welcome any comments and suggestions on how to further increase the stability for MySQL benchmarks. I do not claim to be an expert in this field, and any input will be highly appreciated.

Friday, December 23, 2016

Improved Query Performance by Using Derived Table instead of IN-subquery

MySQL 5.6 introduced semi-join transformation for IN-subqueries. This opened up for several new ways to execute such subqueries; one was no longer tied to executing the subquery once for every row of the outer query. This dramatically improved the performance of many such queries. However, semi-join transformation does not apply to all types of queries. For example, if the subquery has a GROUP BY clause, semi-join does not apply. (For a full list of which types of queries can not use semi-join, see the Reference Manual.)
Fortunately, MySQL 5.6 also introduced subquery materialization that can be used when semi-join does not apply; as long as the subquery is not dependent on the outer query. (There should be no references to tables of the outer query within the subquery). In an earlier blog post, I showed how the query execution time for DBT-3 Query 18 improved from over a month to a few seconds due to subquery materialization. In this blog post, I will show how we can further improve the performance of Query 18 by rewriting it.
Let's first take a look at Query 18 in its original form:
 SELECT c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, SUM(l_quantity)  
 FROM customer, orders, lineitem  
 WHERE o_orderkey IN (  
    SELECT l_orderkey  
    FROM lineitem  
    GROUP BY l_orderkey  
    HAVING SUM(l_quantity) > 313  
  )  
  AND c_custkey = o_custkey  
  AND o_orderkey = l_orderkey  
 GROUP BY c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice  
 ORDER BY o_totalprice DESC, o_orderdate  
 LIMIT 100;  
This query is called Large Volume Customer Query since it finds large orders (of more than 313 items) and returns the top 100 orders; ordered by total price and date. In MySQL 5.5 and earlier, MySQL would execute the GROUP BY subquery once for every row of the outer query. In MySQL 5.6, subquery materialization made it possible to execute this subquery only once. This can be seen from the EXPLAIN output for this query (some of the columns have been removed to save space):
id select_type table type key rows Extra
1 PRIMARY orders ALL NULL 1500000 Using where; Using temporary; Using filesort
1 PRIMARY customer eq_ref PRIMARY 1 NULL
1 PRIMARY lineitem ref i_l_orderkey_quantity 4 Using index
2 SUBQUERY lineitem index i_l_orderkey_quantity 6001215 Using index

That select_type is SUBQUERY indicates that the subquery will be executed only once. (Otherwise, the select_type would be DEPENDENT SUBQUERY.) Here is the Visual EXPLAIN diagram for the same query plan:
A basic component of both semi-join execution strategies and subquery materialization is duplicate removal. Unlike a JOIN operation where the result will contain all combinations of rows that match the join criteria, an IN-expression should only produce one row regardless of the number of matches in the subquery. For semi-join there are multiple strategies for removing duplicates, and for subquery materialization, duplicates are removed from the temporary table.
Looking at QUERY 18, we realize that the subquery will never give any duplicates. The only column selected is the column that we are grouping on, l_orderkey, so all rows will be distinct. This means that we can replace the IN-expression with a join, and get this equivalent query:
SELECT c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, SUM(lineitem.l_quantity)
FROM customer, orders, lineitem,
     ( SELECT l_orderkey
       FROM lineitem
       GROUP BY l_orderkey
       HAVING SUM(l_quantity) > 313
     ) l2
WHERE o_orderkey = l2.l_orderkey
  AND c_custkey = o_custkey
  AND o_orderkey = lineitem.l_orderkey
GROUP BY c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
ORDER BY o_totalprice DESC, o_orderdate
LIMIT 100;

The subquery is now a derived table; a subquery in the FROM clause. Take a look at the query plan for the rewritten query:
Notice that the subquery is materialized as before, but this time the plan is to start with scanning the temporary table instead of the orders table as for the original query. This means that instead of accessing all orders, MySQL will now only access the biggest orders (with more than 313 items). Given that most orders are smaller than that, this should give improved performance. Indeed, with MySQL 5.7.14 and using a scale factor 1 database, the query execution time is slashed by one third by this rewrite:
In general, rewriting the query from an IN-expression to a JOIN is beneficial because the join optimizer may then be used to find the optimal order for accessing the tables. This is the idea behind the semi-join transformations that are done automatically since MySQL 5.6, and this is why our manual transformation gave a better query plan. Hopefully, some time in the future the MySQL Query Optimizer will be able to do such transformations automatically. Until then, watch out for opportunities to manually rewrite IN-subqueries where semi-join does not apply!

Friday, October 21, 2016

Presentations at OpenWorld and PerconaLive

During the last month, I presented on MySQL both at Oracle OpenWorld and Percona Live in Amsterdam.  The slides from the presentations have been uploaded to the conference web sites,  and you also find the slides at Slideshare:

Saturday, September 17, 2016

MySQL Optimizer Sessions at Oracle OpenWorld

Oracle OpenWorld starts soon and there will be a few sessions on the MySQL Optimizer.   On Monday, I will have my tutorial on how to analyze and tune MySQL queries.  Later in the week Manyi Lu, the MySQL optimizer team manager, will present what was new in the MySQL optimizer 5.7, and also give a sneak peek into the MySQL 8.0 release.  Both of us will together have a presentation  on Common Table Expressions; a new SQL feature in MySQL 8.0.  The details are as follows: 

Monday, Sep 19, 1:45 p.m. 
How to Analyze and Tune MySQL Queries for Better Performance [TUT3750]
Oystein Grovlen, Senior Principal Software Engineer, Oracle

Wednesday, Sep 21, 3:00 p.m.
MySQL Optimizer: What’s New in 5.7 and Sneak Peek at 8.0 [CON6112]
Manyi Lu, Director Software Engineering, Oracle

Thursday, Sep 22, 12:00 p.m.
MySQL 8.0: Common Table Expressions [CON7928]
Manyi Lu, Director Software Engineering, Oracle
Oystein Grovlen, Senior Principal Software Engineer, Oracle


All MySQL tutorials and conference sessions are this year at at The Park Central hotel.  For more information on MySQL sessions, see Focus on MySQL @ OpenWorld.  OpenWorld participants can build their own OpenWorld agenda and reserve seats for sessions by going to the OpenWorld home page and select Tools and Resources → My Schedule.

We will also like to invite people to the MySQL community reception at Jillian's on Tuesday, September 20 @ 7 pm.  This reception is not limited to OpenWorld registrants.  So if you are in the SF area, please, feel free to come and meet MySQL developers and users!

Friday, March 4, 2016

Oracle Virtual Technology Summit

In the coming weeks, Oracle Technology Network welcomes you to join the Virtual Technology Summit which is a half-day online conference with multiple tracks.  This time there is a MySQL track, and I will do a presentation on Analyze & Tune MySQL Queries for Better Performance.  The MySQL track also contains a presentation on MySQL 5.7, and a presentation on how to use Oracle Enterprise Manager to manage MySQL databases.

There will be three events with the same program, but at different times to best suit different parts of the world.   The first opportunity is the Americas event on March 8.  Later, there will be events for Asia/Pacific (March 15) and Europe/Middle East/Africa (April 5).  

To register, go to the pages for the Americas, Asia/Pacific, and Europe/Middle East/Africa events, respectively.  It will also be possible to listen to the webcast after the events, but then you will miss the opportunity to ask questions as we go along.