Wednesday, June 9, 2021

Use Window Functions to Speed Up Correlated Subqueries

In my previous blog post, I discussed different ways to rewrite correlated scalar subqueries to improve performance. In this blog post, I will discuss how we can use window functions for a subset of these queries. Window functions was introduced in MySQL 8.0.

Use case

Often we run queries to find rows within a table with values that are, e.g., smaller, greater, or equal to some metric computed over a set of rows from the same table. Some examples are:

  • Find orders for more than 1% of the total revenue for the last week.
  • Find the supplier with the lowest price for a certain part.
  • Find orders for less than 20% of the average ordered quantity of a part.

The latter is a descripton of TPC-H Q17, which we discussed in my previous blog post:

SELECT SUM(l_extendedprice) / 7.0 AS avg_yearly 
FROM lineitem JOIN part ON p_partkey = l_partkey 
WHERE p_brand = 'Brand#11' AND p_container = 'SM CAN' 
  AND l_quantity < 
      (SELECT 0.2 * AVG(l_quantity) 
       FROM lineitem 
       WHERE l_partkey = p_partkey);

This query refers to the lineitem table twice:

  1. The subquery finds the average ordered quantity of a part
  2. The outer query access lineitem to find the price for the parts when less than 20% of the average quantity is ordered.

Rewrite to use window function

If we use window functions, we can compute the average AND collect the interesting information at the same time:

WITH win AS (
    SELECT l_extendedprice, l_quantity,
           AVG(l_quantity) OVER (PARTITION BY p_partkey) avg_l_quantity
    FROM lineitem, part
    WHERE p_partkey = l_partkey
      AND p_brand = 'Brand#11' AND p_container = 'SM CAN'
)	
SELECT SUM(l_extendedprice) / 7.0 AS avg_yearly
FROM win
WHERE l_quantity < 0.2 * avg_l_quantity;
Here the Common Table Expression (CTE) will both fetch the necessary column values and compute the average quantity for each part. Then, the main query can just filter the rows from the CTE where l_quantity is less than 20% of the average.

There are some conditions that needs to be satisfied in order to safely apply this transformation:

  • The subquery refers a subset of the tables in the outer query
  • The subquery uses aggregation and there is no DISTINCT within the aggregation function
  • The aggregation function must have an equivalent window aggregation function
  • If there are multiple tables that are referred in the FROM clauses of both the subquery and the outer query, they should be joined on the same conditions
  • Any conditions in the WHERE clause should be the same for both subquery and outer query
  • The column(s) from the outer query that is referred in the subuqery, p_partkey in the case of Q17 above, should be unique
  • No ORDER BY or LIMIT in the subquery
There are cases where we could still apply the transformation even if all conditions above are not met. However, then there is a risk that the transformed query will be less efficient than the original query.

Performance

Let us compare the performance of this query to the variants we tested in my previous blog post:

We see that the query that use window functions is more than 5 times faster than the original query, but it is not faster than the variant that use a Lateral Derived Table. In fact, it is around 2.5% slower.

How can it be slower when it accesses the lineitem table only once, while the query that use LATERAL access the table twice? The answer is that there is some overhead associated with the window function. Here is a flame graph for the query that use a Lateral Derived table:

The left-most and right-most stacks represents the index look-ups into the lineitem table. Notice that the stack to the left is somewhat narrower. As discussed in the previous blog post, this is because there will be less cache misses on the second access.

Compare this to the flame graph for the query that use a window function:

In this case there is only one stack for the function RefIterator<false>::Read, but there is a new stack to the right which represents the calculation by the window function. In other words, we save some, and we loose some, and overall the performance is about the same for both query variants.

TPC-H Query 2

Query 2 of the TPC-H benchmark can also be rewritten to use a window function. This query finds information on the suppliers in Europe with the lowest cost for certain parts:

SELECT s_acctbal, s_name, n_name, p_partkey, p_mfgr, 
       s_address, s_phone, s_comment
FROM part, supplier, partsupp, nation, region
WHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey
  AND p_size = 27 AND p_type LIKE '%BRASS'
  AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey
  AND r_name = 'EUROPE'
  AND ps_supplycost = (
      SELECT MIN(ps_supplycost)
	  FROM partsupp, supplier, nation, region
	  WHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey 
        AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey
	    AND r_name = 'EUROPE'
  )
ORDER BY s_acctbal DESC, n_name, s_name, p_partkey
LIMIT 100;
 
In this query there are multiple tables that are referred in both query blocks. The rewritten query that use window functions looks like this:
WITH win AS (
    SELECT s_acctbal, s_name, n_name, p_partkey, p_mfgr, 
           s_address, s_phone, s_comment, ps_supplycost,
           MIN(ps_supplycost) OVER (PARTITION BY p_partkey) min_cost
    FROM partsupp, supplier, nation, region, part
    WHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey 
      AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey
      AND p_size = 27 AND p_type LIKE '%BRASS' AND r_name = 'EUROPE'
)    
SELECT s_acctbal, s_name, n_name, p_partkey, p_mfgr,
       s_address, s_phone, s_comment
FROM win
WHERE ps_supplycost = min_cost
ORDER BY s_acctbal DESC, n_name, s_name, p_partkey
LIMIT 100;

Here is a comparison of the execution time for the different possible query transformation for Q2:

For this query, using a lateral derived table is almost 20% slower than using a window function. This time, both using the variant with table push-down and using a window function are about 40% faster than the original query.

Conclusions

I have in this blog post shown that we can improve query performance by rewriting queries with correlated scalar subqueries to use window functions. For both example queries presented here, the performance when using a window function is on par with alternative ways to rewrite the query. Also, all the other alternatives are slower than using a window function for at least one of the queries.

At Alibaba, we have implemented the query transformation to window functions in our Innovative Version of POLARDB for MySQL. Unlike the subquery transformation introduced in MySQL 8.0.24, which is off by default since it may often significantly reduce the query performance, our transformation can be applied automatically. As long as the query satisfies the conditions listed above, it should not be any slower than the original query. In other words, with POLARDB you will no longer have to manually rewrite your query to get the benefits discussed in this blog post.

No comments:

Post a Comment