This blog post is inspired by Jesper's recent blog post on how the automatic update of InnoDB persistent statistics may never trigger if servers are restarted frequently. However, the matter is even worse! In this blog post I will show that even when the automatic recalculation is performed, there are no guarantees as to when the server will see the changes.
Myths about persistent statistics
I must admit that I, for a long time, believed in the myths I was told about when updates to InnoDB's index statistics become visible to the query optimizer. There were basically two variants:
- The updated statistics will only be visible to new connections (i.e., sessions that are started after the statistics was updated)
- When a new connection accesses the table, the updated statistics will be visible to all connections.
When does InnoDB's recalculation of statistics become visible?
We will use the following table for the experiment:con1> SHOW CREATE TABLE t2\G
*************************** 1. row ***************************
Table: t2
Create Table: CREATE TABLE `t2` (
`i` int NOT NULL,
`j` int DEFAULT NULL,
PRIMARY KEY (`i`),
KEY `j` (`j`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
con1> INSERT INTO t2 SELECT i, j FROM t1;
Query OK, 100 rows affected (0.00 sec)
Records: 100 Duplicates: 0 Warnings: 0
At this point, both tables, t1 and t2, will have 100 rows with unique values for both columns i and j. If we look at the persistent statistics, we see that both of t2's indexes contains 100 unique values:
con1> SELECT index_name, stat_name, stat_value, stat_description
-> FROM mysql.innodb_index_stats WHERE table_name = 't2';
+------------+--------------+------------+-----------------------------------+
| index_name | stat_name | stat_value | stat_description |
+------------+--------------+------------+-----------------------------------+
| PRIMARY | n_diff_pfx01 | 100 | i |
| PRIMARY | n_leaf_pages | 1 | Number of leaf pages in the index |
| PRIMARY | size | 1 | Number of pages in the index |
| j | n_diff_pfx01 | 100 | j |
| j | n_diff_pfx02 | 100 | j,i |
| j | n_leaf_pages | 1 | Number of leaf pages in the index |
| j | size | 1 | Number of pages in the index |
+------------+--------------+------------+-----------------------------------+
7 rows in set (0.00 sec)
If we look at the query plan for a join of the two tables, we see that the rows column for t2 contains 1. In other words, for each row in t1, there is one matching row in t2:
con1> EXPLAIN SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.i=t2.j;
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | j | 5 | NULL | 100 | 100.00 | Using index |
| 1 | SIMPLE | t2 | NULL | ref | j | j | 5 | test.t1.i | 1 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
We will now add 100 more rows to t2 so that there are 2 rows for each value of j:
con1> INSERT INTO t2 SELECT i+100, j FROM t1;
Query OK, 100 rows affected (0.00 sec)
Records: 100 Duplicates: 0 Warnings: 0
con1> SELECT index_name, stat_name, stat_value, stat_description
-> FROM mysql.innodb_index_stats WHERE table_name = 't2';
+------------+--------------+------------+-----------------------------------+
| index_name | stat_name | stat_value | stat_description |
+------------+--------------+------------+-----------------------------------+
| PRIMARY | n_diff_pfx01 | 200 | i |
| PRIMARY | n_leaf_pages | 1 | Number of leaf pages in the index |
| PRIMARY | size | 1 | Number of pages in the index |
| j | n_diff_pfx01 | 100 | j |
| j | n_diff_pfx02 | 200 | j,i |
| j | n_leaf_pages | 1 | Number of leaf pages in the index |
| j | size | 1 | Number of pages in the index |
+------------+--------------+------------+-----------------------------------+
7 rows in set (0.00 sec)
We see that the persistent statistics has been updated, but when we look at the query plan, the row estimate for t2 has not changed:
con1> EXPLAIN SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.i=t2.j;
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | j | 5 | NULL | 100 | 100.00 | Using index |
| 1 | SIMPLE | t2 | NULL | ref | j | j | 5 | test.t1.i | 1 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
The next step is to check what happens if we open another connection to the database:
$dbdeployer use msb_8_0_26
running /home/xxx/sandboxes/msb_8_0_26/ use
/home/xxx/sandboxes/msb_8_0_26/use
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 8.0.26 MySQL Community Server - GPL
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql [localhost:8026] {msandbox} ((none)) > prompt con2>
PROMPT set to 'con2> '
con2> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
con2> EXPLAIN SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.i=t2.j;
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | j | 5 | NULL | 100 | 100.00 | Using index |
| 1 | SIMPLE | t2 | NULL | ref | j | j | 5 | test.t1.i | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
We see that the row estimate for t2 has been updated. When we check our first connection, we see that the row estimate has now also changed here:
con1> EXPLAIN SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.i=t2.j;
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | j | 5 | NULL | 100 | 100.00 | Using index |
| 1 | SIMPLE | t2 | NULL | ref | j | j | 5 | test.t1.i | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
From this experiment it may seem that when a new connection accesses the table, the fresh statistics will be visible to all connections. However, it is not that straight-forward. When looking at the source code, we find that all connections share the same cached statistics, but this cache will only be automatically refreshed when a new table object is needed. Since closed table objects are cached for reuse, a query may not need to create a new object. So when we run our join query in the initial connection, we will reuse an existing table object, and the cached statistics will not be refreshed.
To improve scalability, the open tables cache is by default partitioned into 16 cache instances. This means that when we connect the second time, our new connection is assigned to a different instance where there are no cached table objects. Hence, a new table object is requested, and the statistics is refreshed in the process. To make all connections share the same open tables cache, we can set the system variable table_open_cache_instances to 1. Note that this is not a dynamic variable, so we will have to set it at startup.
We restart the server with this variable set to 1, and insert 100 more rows:
con1> select @@table_open_cache_instances;
+------------------------------+
| @@table_open_cache_instances |
+------------------------------+
| 1 |
+------------------------------+
1 row in set (0.00 sec)
con1> INSERT INTO t2 SELECT i+200, j FROM t1;
Query OK, 100 rows affected (0.00 sec)
Records: 100 Duplicates: 0 Warnings: 0
con1> SELECT index_name, stat_name, stat_value, stat_description
-> FROM mysql.innodb_index_stats WHERE table_name = 't2';
+------------+--------------+------------+-----------------------------------+
| index_name | stat_name | stat_value | stat_description |
+------------+--------------+------------+-----------------------------------+
| PRIMARY | n_diff_pfx01 | 300 | i |
| PRIMARY | n_leaf_pages | 1 | Number of leaf pages in the index |
| PRIMARY | size | 1 | Number of pages in the index |
| j | n_diff_pfx01 | 100 | j |
| j | n_diff_pfx02 | 300 | j,i |
| j | n_leaf_pages | 1 | Number of leaf pages in the index |
| j | size | 1 | Number of pages in the index |
+------------+--------------+------------+-----------------------------------+
7 rows in set (0.00 sec)
If the cached the statistics had been refreshed, we should now see that the row estimate for t2 had increased to 3, but, as expected, it is still 2:
con1> EXPLAIN SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.i=t2.j;
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | j | 5 | NULL | 100 | 100.00 | Using index |
| 1 | SIMPLE | t2 | NULL | ref | j | j | 5 | test.t1.i | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
This time, when we connect with a second client, the row estimates is 2 also here:
con2> EXPLAIN SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.i=t2.j;
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | j | 5 | NULL | 100 | 100.00 | Using index |
| 1 | SIMPLE | t2 | NULL | ref | j | j | 5 | test.t1.i | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
What happened here, was that the table object used by the first connection was reused, so the statistics was not updated. However, if we run a query that refer to t2 twice, an extra table object needs to be opened, and the statistics will be refreshed:
con2> EXPLAIN SELECT * FROM t2 AS t1 STRAIGHT_JOIN t2 ON t1.i=t2.j;
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | j | 5 | NULL | 300 | 100.00 | Using index |
| 1 | SIMPLE | t2 | NULL | ref | j | j | 5 | test.t1.i | 3 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
When a server has been running for a while, there will normally be multiple cached table objects in every cache instance. This means that unless you are close to your previous peak load, no new table objects will be created. In other words, it may take a long time before the query optimizer sees any of the recalculations of index statistics initiated by InnoDB. There is an old bug report that describes this behavior, but it has so far not been fixed.
ANALYZE TABLE
If we run ANALYZE TABLE, the cached statistics will be updated immediately:con1> INSERT INTO t2 SELECT i+300, j FROM t1;
Query OK, 100 rows affected (0.00 sec)
Records: 100 Duplicates: 0 Warnings: 0
con1> EXPLAIN SELECT * FROM t2 AS t1 STRAIGHT_JOIN t2 ON t1.i=t2.j;
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | j | 5 | NULL | 300 | 100.00 | Using index |
| 1 | SIMPLE | t2 | NULL | ref | j | j | 5 | test.t1.i | 3 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
con1> ANALYZE TABLE t2;
+---------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t2 | analyze | status | OK |
+---------+---------+----------+----------+
1 row in set (0.00 sec)
con1> EXPLAIN SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.i=t2.j;
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | j | 5 | NULL | 100 | 100.00 | Using index |
| 1 | SIMPLE | t2 | NULL | ref | j | j | 5 | test.t1.i | 4 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+-----------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
In other words, if you want to guarantee that the query optimizer has fresh statistics, you should run ANALYZE TABLE regularly. Below I will discuss how to best use ANALYZE TABLE.
When Should I run ANALYZE TABLE?
How often you need to run ANALYZE TABLE depends on the type of data in your index. For example, if the average number of orders per day does not change much over time, it is not necessary to refresh the statistics for an index on order_date very often. On the other hand, the same table of orders may have an index on customer_id that will require more frequent updates to the statistics since the number of orders per customer will (hopefully) increase over time. ANALYZE TABLE will update the statistics for all the indexes of a table. Hence, how often an update is needed, will be depend on the index where the average number of rows per value changes most rapidly.
In most cases, running it once per week should be sufficient, but there might be cases where it could be useful to run it more often, so my suggestion is to run in on a daily basis at a time where the load is low. Also, always make sure to run ANALYZE TABLE after bulk inserts/updates/deletes. If you run ANALYZE TABLE on a regular basis, I also suggest you turn off InnoDB's automatic recalculation by setting the system variable innodb_stats_auto_recalc to OFF.
Increase the sampling rate
The number of index pages that are sampled when calculating index statistics, is controlled by the system variable innodb_stats_persistent_sample_pages. By default, at most 20 pages are sampled, but this often results in very inaccurate estimates, and I suggest you increase this value to at least 200. If the automatic recalculation is turned off, so you have control over when the recalculation is performed, it should be OK to set it even higher. On my system, ANALYZE TABLE is able to process 5000 pages per second even when most of the pages have to be read from disk. In other words, if you set innodb_stats_persistent_sample_pages to 1000, you should be able to recalculate the index statistics for 300 indexes in less than a minute!
ANALYZE TABLE is no longer blocking
As Jesper wrote in his blog post, it used to be that ANALYZE TABLE would block all new queries until ongoing queries had completed. In order to safely rely on ANALYZE TABLE, you should make sure to use a release where this have been fixed. The issue was first fixed in Percona's releases a few years ago, and at Alibaba we have ported this fix to POLARDB. This year, MySQL 8.0.24 also provided a fix for this issue.
Conclusions
In this blog post I have shown that there is no guarantee that the index statistics is up-to-date if you solely rely on InnoDB's automatic updates. Instead, you should regularly run ANALYZE TABLE. To ensure more accurate statistics, it is also a good idea to increase the setting for innodb_stats_persistent_sample_pages.
I want to thank my colleague Kaiwang for insights and discussions that helped me understand this issue better.