
MySQL obviously got many things right, otherwise, it would not be the World’s Most Popular Open Source Database (according to DB-Engines). Sometimes, however, I run into some decisions or behaviors which are just plain bad designs. Many such designs have a lot of historical reasoning behind them and maybe they are still here because not enough resources are allocated to cleaning up technical debt.
I’m passionate about observability, especially when it comes to understanding system performance. One of the most important pieces of data to understand MySQL Performance is understanding its latches contention (mutexes, rwlocks, etc).
The “best” way to understand latches in MySQL is Performance Schema. Unfortunately latching profiling is disabled by default in Performance Schema because it causes quite a significant overhead; significant enough you likely will not be running with this instrumentation in production at all times.
If you’re looking to get some always available mutex information from MySQL, you can get them from the InnoDB storage engine (which is often good enough, as this is where most contention happens).
One choice is to look into SHOW ENGINE INNODB STATUS output – particularly in SEMAPHORES section:
---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 490020789 --Thread 140407582807808 has waited at row0ins.cc line 2412 for 0 seconds the semaphore: S-lock on RW-latch at 0x7fa0159bd6f0 created in file buf0buf.cc line 785 a writer (thread id 140407910762240) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file row0ins.cc line 2412 Last time write locked in file /mnt/workspace/percona-server-8.0-debian-binary/label_exp/min-bionic-x64/test/percona-server-8.0.18-9/storage/innobase/include/mtr0mtr.ic line 142 --Thread 140386577712896 has waited at row0ins.cc line 2412 for 0 seconds the semaphore: S-lock on RW-latch at 0x7fa0159bd6f0 created in file buf0buf.cc line 785 a writer (thread id 140407910762240) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file row0ins.cc line 2412 Last time write locked in file /mnt/workspace/percona-server-8.0-debian-binary/label_exp/min-bionic-x64/test/percona-server-8.0.18-9/storage/innobase/include/mtr0mtr.ic line 142
This section will provide information on mutex which are being waited for along with wait time information which can be very helpful. Unfortunately, this information is provided in the form which is not easily parsable and only can be retrieved with whole SHOW ENGINE INNODB STATUS output, causing extra load and making it a poor fit for high-frequency sampling.
Why this information was never made accessible with some INFORMATION_SCHEMA table is a great puzzle. Was it because the idea was that PERFORMANCE_SCHEMA should be the one and only tool for Observability, even when the MySQL engineering team can’t get it to perform with acceptable overhead?
But wait, you say… there is a better way – you can use SHOW ENGINE INNODB MUTEX to get a summary of the stats:
mysql> SHOW ENGINE INNODB MUTEX; +--------+----------------------------+-----------------+ | Type | Name | Status | +--------+----------------------------+-----------------+ | InnoDB | rwlock: dict0dict.cc:2454 | waits=138 | | InnoDB | rwlock: dict0dict.cc:2454 | waits=545 | | InnoDB | rwlock: dict0dict.cc:2454 | waits=124 | | InnoDB | rwlock: dict0dict.cc:2454 | waits=110 | | InnoDB | rwlock: dict0dict.cc:2454 | waits=134 | | InnoDB | rwlock: dict0dict.cc:2454 | waits=132 | | InnoDB | rwlock: dict0dict.cc:2454 | waits=5317 | | InnoDB | rwlock: dict0dict.cc:2454 | waits=538 | … | InnoDB | rwlock: hash0hash.cc:171 | waits=219 | | InnoDB | rwlock: hash0hash.cc:171 | waits=291 | | InnoDB | rwlock: hash0hash.cc:171 | waits=290 | | InnoDB | rwlock: hash0hash.cc:171 | waits=312 | | InnoDB | rwlock: hash0hash.cc:171 | waits=281 | | InnoDB | rwlock: hash0hash.cc:171 | waits=226 | | InnoDB | rwlock: hash0hash.cc:171 | waits=327 | | InnoDB | sum rwlock: buf0buf.cc:785 | waits=138699546 | +--------+----------------------------+-----------------+ 332 rows in set (0.59 sec)
This command does not provide the same information (it shows the number of waits, not current waits) but it is helpful. The problem with this command is it looks like it was specially designed to be as least-useful as possible; see there are a lot of duplicates at “Name”. This is because there are multiple instances of the same kind of mutex. In many cases, when you want to understand what kind of contention you’re dealing with, you would like to SUM the number of Waits grouping it by Name, and unfortunately, you can’t do that with SHOW commands.
Even more strange is the choice of waits=N syntax and naming a column “Status” where relational database design would suggest using “Waits” as a column name instead.
I also would prefer to see sync object name here, not the source code line as it is usually a lot more descriptive. MariaDB does it, BTW, and it also makes it available as innodb_mutex information schema table.
Finally, note how slow (read: expensive) this command is: 0.6 seconds on an 80GB buffer pool. The reason is it captures mutex contention on buffer pool pages, which is super helpful at identifying page-specific contention but also requires aggregating information for potentially millions of objects which takes time.
I think the INFORMATION_SCHEMA table would be a much better choice to show this information too.
OK, so we can’t run plain and simple SELECT on INFORMATION_SCHEMA table to get the data in an easily digestible form, but maybe we should write a stored procedure instead ?
This brings us to another design problem. While you can iterate SELECT output in a stored procedure easily, it does not work for SHOW commands. There probably was a good practical reason for this limitation, but it is not user-friendly at all.
When nothing else helps there is always Shell scripting, and we can use it to solve this problem too:
root@rocky:~# mysql -BNe "SHOW ENGINE INNODB MUTEX" | awk -F'\t' '{split($3, waits, "="); out[$2]+=waits[2];} END { for(el in out) printf "%s\t%d\n", el, out[el] } ' sum rwlock: buf0buf.cc:785 138984320 rwlock: btr0sea.cc:202 226767645 rwlock: trx0purge.cc:222 13 rwlock: ibuf0ibuf.cc:543 345822 rwlock: dict0dict.cc:2454 1958468 rwlock: dict0dict.cc:1042 66610 rwlock: fil0fil.cc:3150 1064444 rwlock: hash0hash.cc:171 37536 rwlock: dict0dict.cc:330 131
However, if your database requires you to do that for basic grouping of the information it provides, there is something wrong here!
What would I like to see? I believe all SHOW statements should be reviewed and if they are not planned for depreciation, information similar to what they provide should be made available from tables or views. In fact, this work was already done for most common commands, but it looks like it was never completed.