Monday, October 3, 2011

Batched Key Access Speeds Up Disk-Bound Join Queries

A new feature in MySQL 5.6.3 Development Milestone Release is Batched Key Access (BKA). BKA can be applied when an index lookup can be used to execute a join query. One example of such a query is:

mysql> SELECT * FROM customer JOIN orders ON c_custkey = o_custkey;

Given that the customer table is significantly smaller than the orders table, and assuming that we have an index on the o_custkey column of orders, MySQL have traditionally executed this join as follows: Scan the entire customer table, and for each row in the customer table, do an index look-up into the orders table to find matching rows. This strategy is know as Index Nested Loops Join, or as MySQL EXPLAIN puts it:

+----+-------------+----------+------+---------------+-------------+---------+-------------------------+--------+-------+
| id | select_type | table    | type | possible_keys | key         | key_len | ref                     | rows   | Extra |
+----+-------------+----------+------+---------------+-------------+---------+-------------------------+--------+-------+
|  1 | SIMPLE      | customer | ALL  | PRIMARY       | NULL        | NULL    | NULL                    | 150000 |       |
|  1 | SIMPLE      | orders   | ref  | i_o_custkey   | i_o_custkey | 5       | dbt3.customer.c_custkey |      7 |       |
+----+-------------+----------+------+---------------+-------------+---------+-------------------------+--------+-------+

If BKA is applied, MySQL will instead buffer keys to be used for index look-up in the join buffer, and each time the join buffer is full, it will do a batched look-up on the index. That is, the whole set of keys from the join buffer is sent to the storage engine in one batch. For this purpose, MySQL Server uses the Multi-Range Read interface of the Storage Engine API.

Multi-Range Read (MRR)

The advantage of requesting a batch of rows from the storage engine, is that the storage engine may be able to deliver the rows more efficiently that way. The MRR interface has for some time been used in MySQL Cluster since it reduces communication between nodes by making it possible to request more than one row per round-trip.

A new feature in MySQL 5.6.2 was Disk-Sweep Multi-Range Read (DS-MRR) optimizations for both InnoDB and MyISAM. When DS-MRR is applied, the storage engine will access the data rows in the order given by the base table; instead of in the order given by the index. For a disk-bound query this has two advantages:
  1. All interesting rows on the same disk page will be accessed together. Hence, one do not risk that a page has been removed from the buffer pool between two accesses to the page.
  2. Even if modern disks/file systems do not give any guarantees, accessing the pages in table order, will normally give close to sequential access to the disk. As we know, that should give much better performance compared to random access. (If multiple sessions access the disks simultaneously, the advantage of sequential access within a session will of course be less significant.)

Turning On Batched Key Access

While BKA is advantageous in a disk-bound system, it may actually reduce performance in a CPU-bound setting. (More on this below.) Since the MySQL Optimizer currently has no information on whether a table resides in memory or on disk, BKA is off by default.

You can turn on BKA by setting an optimizer_switch flag. Since BKA depends on MRR, the MRR flag also needs to be on. (This is default.) In addition, since the cost model for MRR is currently way too conservative, in order to use MRR, cost-based MRR optimization needs to be turned off. In other words, the following settings must be done in order for the Optimizer to consider BKA:

mysql> SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';

With BKA turned on, EXPLAIN for the query presented earlier, will look like this:

+----+-------------+----------+------+---------------+-------------+---------+-------------------------+--------+----------------------------------------+
| id | select_type | table    | type | possible_keys | key         | key_len | ref                     | rows   | Extra                                  |
+----+-------------+----------+------+---------------+-------------+---------+-------------------------+--------+----------------------------------------+
|  1 | SIMPLE      | customer | ALL  | PRIMARY       | NULL        | NULL    | NULL                    | 150000 |                                        |
|  1 | SIMPLE      | orders   | ref  | i_o_custkey   | i_o_custkey | 5       | dbt3.customer.c_custkey |      7 | Using join buffer (Batched Key Access) |
+----+-------------+----------+------+---------------+-------------+---------+-------------------------+--------+----------------------------------------+

Performance Impact of BKA on DBT-3 Queries

The chart below shows query execution times, with and without BKA, for all the queries of the DBT-3 benchmark where BKA can be used. The size of the database is DBT-3 scale 1 (4 GB), and in order to get disk bound queries, the size of the InnoDB buffer pool is just 50 MB. We see that, for most of the queries, the run time with BKA is around half the run time without BKA .


Note that Query 17 is an example of a query where it is not beneficial to use BKA even when the query is disk-bound. In that particular case, locality of accesses are very good without BKA, and applying BKA actually spreads accesses to related rows in time, decreasing the buffer cache hit ratio.

Effects of Increasing the Join Buffer Size

As mention above, the size of the batches used by BKA is determined by the size of the join buffer. It follows from the discussion above that the larger the batch, the more efficiently the storage engine may arrange its disk accesses. The results presented above were achieved with a default setting of join_buffer_size (128 kB). We will now look closer at what happens if we increase the size of the join buffer. For this purpose, we will use Query 13 (Customer Distribution Query) from the DBT-3 benchmark:

SELECT c_count, COUNT(*) AS custdist
FROM (
      SELECT c_custkey, COUNT(o_orderkey) AS c_count
      FROM customer LEFT OUTER JOIN orders ON
        c_custkey = o_custkey AND o_comment NOT LIKE '%express%requests%'
      GROUP BY c_custkey
     ) AS c_orders 
GROUP BY c_count
ORDER BY custdist DESC, c_count DESC;

The heart of this query is the join between customer table and orders table:

+----+-------------+------------+-------+---------------+---------------+---------+-------------------------+--------+-----------------------------------------------------+
| id | select_type | table      | type  | possible_keys | key           | key_len | ref                     | rows   | Extra                                               |
+----+-------------+------------+-------+---------------+---------------+---------+-------------------------+--------+-----------------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL          | NULL    | NULL                    | 1050000 | Using temporary; Using filesort                     |
|  2 | DERIVED     | customer   | index | NULL          | i_c_nationkey | 5       | NULL                    |  150000 | Using index; Using temporary; Using filesort        |
|  2 | DERIVED     | orders     | ref   | i_o_custkey   | i_o_custkey   | 5       | dbt3.customer.c_custkey |       7 | Using where; Using join buffer (Batched Key Access) |
+----+-------------+------------+-------+---------------+---------------+---------+-------------------------+--------+-----------------------------------------------------+

Using BKA for this query, MySQL will fill the join buffer with customer keys and send a batch of keys to the storage engine. The storage engine will find the primary keys for the requested rows in the i_o_custkey index, sort the primary keys, and access the orders table in primary key order. In other words, there will be one pass over the orders table for each batch. The graph below shows the effect of increasing the join buffer on Query 13. (Note the logarithmic scale of the y-axis.) The query execution time goes down from 20 minutes with a default join buffer to less than 10 secs with a 32 MB join buffer!



BKA Performance When Queries are CPU-bound

As discussed above, BKA is designed to speed up disk-bound queries. Now we will look at what happens if BKA is used in a system where all the data in the database reside in main memory. The chart below presents the results of running the same DBT-3 queries with a InnoDB buffer pool of 5 GB. While the difference is not very significant for most queries, we see there are a few queries where the impact of BKA is quite negative.


The negative impact of BKA is not just the overhead of sorting the keys before accessing the table. If you look at the execution plan for QUERY 13 as presented above, it shows that the result of the join needs to be sorted in order to perform the group by operation of the subquery. This would not have been necessary, had not BKA been used. Without BKA, the result from the join would have been delivered in customer key order and grouping could be done without sorting. Hence, in this experiment, the execution time for Query 13 increased from 3.6 to 4.8 seconds when using BKA.

Use with Care

As I have shown, join query execution can benefit significantly from the use of Batched Key Access. However, there is also examples of queries where BKA do more harm than good. Hence, it should be used with care. My advice is to study the BKA performance for your typical queries and to only turn on BKA for queries that are known to benefit from it.