
ProxySQL 2.3.0 was recently released and when I was reading the release notes, I was really impressed with the Group Replication enhancements and features. I thought of experimenting with those things and was interested to write a blog about them. Here, I have focused on the following two topics:
- When the replication lag threshold is reached, ProxySQL will move the server to SHUNNED state, instead of moving them to OFFLINE hostgroup. When shunning a server, it will be performed gracefully and not immediately drop all backend connections.
- The servers can be taken to maintenance through ProxySQL using “OFFLINE_SOFT”.
Test Environment
To test this, I have configured a three-node GR cluster (gr1,gr2,gr3) in my local environment. I have configured a single primary cluster (1 writer, 2 readers).
mysql> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members; +-------------+--------------+-------------+----------------+ | member_host | member_state | member_role | member_version | +-------------+--------------+-------------+----------------+ | gr1 | ONLINE | PRIMARY | 8.0.26 | | gr2 | ONLINE | SECONDARY | 8.0.26 | | gr3 | ONLINE | SECONDARY | 8.0.26 | +-------------+--------------+-------------+----------------+ 3 rows in set (0.00 sec)
Currently, there is no transaction delay in the GR cluster.
mysql> select * from sys.gr_member_routing_candidate_status; +------------------+-----------+---------------------+----------------------+ | viable_candidate | read_only | transactions_behind | transactions_to_cert | +------------------+-----------+---------------------+----------------------+ | YES | YES | 0 | 0 | +------------------+-----------+---------------------+----------------------+ 1 row in set (0.00 sec)
To compare the result with older ProxySQL versions, I have configured two versions of ProxySQL. One is the latest version (2.3.0) and another one is the older version (2.2.2).
ProxySQL 1: (2.3.0)
mysql> show variables like '%admin-version%'; +---------------+------------------+ | Variable_name | Value | +---------------+------------------+ | admin-version | 2.3.1-8-g794b621 | +---------------+------------------+ 1 row in set (0.00 sec)
ProxySQL 2: ( < 2.3.0 )
mysql> show variables like '%admin-version%'; +---------------+-------------------+ | Variable_name | Value | +---------------+-------------------+ | admin-version | 2.2.2-11-g0e7630d | +---------------+-------------------+ 1 row in set (0.01 sec)
GR nodes are configured on both the ProxySQLs:
mysql> select hostgroup_id,hostname,status from runtime_mySQL_servers; +--------------+----------+--------+ | hostgroup_id | hostname | status | +--------------+----------+--------+ | 2 | gr1 | ONLINE | | 3 | gr3 | ONLINE | | 3 | gr2 | ONLINE | +--------------+----------+--------+ 3 rows in set (0.00 sec)
Host group settings are:
mysql> select writer_hostgroup,reader_hostgroup,offline_hostgroup from runtime_mysql_group_replication_hostgroups\G *************************** 1. row *************************** writer_hostgroup: 2 reader_hostgroup: 3 offline_hostgroup: 4 1 row in set (0.00 sec)
Scenario 1: When the replication lag threshold is reached, ProxySQL will move the server to SHUNNED state, instead of moving them to OFFLINE host group.
Here the replication lag threshold is configured when the ProxySQL is “20”.
mysql> select @@mysql-monitor_groupreplication_max_transactions_behind_count; +----------------------------------------------------------------+ | @@mysql-monitor_groupreplication_max_transactions_behind_count | +----------------------------------------------------------------+ | 20 | +----------------------------------------------------------------+ 1 row in set (0.00 sec)
As per my current setting,
- At ProxySQL 2.3.0, if the transaction_behind reaches 20, then the node will be put into “SHUNNED” state.
- At “< ProxySQL 2.3.0”, if the transaction_behind reaches 20, then the node will be put into an offline hostgroup.
To manually create the replication lag, I am going to start the read/write load on the GR cluster using the sysbench.
sysbench oltp_read_write --tables=10 --table_size=1000000 --mysql-host=172.28.0.96 --mysql-port=6033 --mysql-user=monitor --mysql-password="Monitor@321" --mysql-db=jc --time=30000 --threads=50 --report-interval=1 run
As expected, now I can see the transaction delay in the cluster.
mysql> select * from sys.gr_member_routing_candidate_status; +------------------+-----------+---------------------+----------------------+ | viable_candidate | read_only | transactions_behind | transactions_to_cert | +------------------+-----------+---------------------+----------------------+ | YES | YES | 457 | 0 | +------------------+-----------+---------------------+----------------------+ 1 row in set (0.00 sec)
Let’s see how the different ProxySQL versions are behaving now.
At ProxySQL 2.3.0:
mysql> select hostgroup_id,hostname, status from runtime_mysql_servers; +--------------+----------+---------+ | hostgroup_id | hostname | status | +--------------+----------+---------+ | 2 | gr1 | ONLINE | | 3 | gr2 | SHUNNED | | 3 | gr3 | SHUNNED | +--------------+----------+---------+ 3 rows in set (0.00 sec)
As expected, both the reader nodes (gr2,gr3) are moved to “SHUNNED” state. And, the servers are still available in reader_hostgroup.
At “< ProxySQL 2.3.0”:
mysql> select hostgroup_id,hostname, status from runtime_mysql_servers; +--------------+----------+---------+ | hostgroup_id | hostname | status | +--------------+----------+---------+ | 2 | gr1 | ONLINE | | 4 | gr2 | ONLINE | | 4 | gr3 | ONLINE | +--------------+----------+---------+ 3 rows in set (0.00 sec)
The server status is still ONLINE. But, the hostgroup_id is changed from 3 to 4. “4” is the offline hostgroup_id.
So, when comparing both the results, seems the latest release (2.3.0) has the correct implementation. Shunning the node is just temporarily taking the server out of use until the replication lag issue is fixed. When shunning a server, it will be performed gracefully and not immediately drop all backend connections. You can see the servers are still available in the reader hostgroups. With the previous implementation, the servers are moved to offline_hostgroup immediately.
Again, from ProxySQL 2.3.0, during the lag, shunning the nodes depend on the parameter “mysql-monitor_groupreplication_max_transactions_behind_for_read_only”. The parameter has 3 values (0,1,2).
- “0” means only servers with “read_only=0” are placed as SHUNNED.
- “1” means Only servers with “read_only=1” are placed as SHUNNED. This is the default one.
- “2” means Both servers with “read_only=1” and “read_only=0” are placed as SHUNNED.
Scenario 2: The servers can be taken to maintenance through ProxySQL using “OFFLINE_SOFT”.
Personally, I would say, this is one of the nice implementations. From ProxySQL 2.3.0, ProxySQL itself can put the servers into the maintenance mode using the “OFFLINE_SOFT”. In the older version, you can also set it, but it was not stable. Let me explain the behavior of the latest and the older version.
Both ProxySQLs have the following configuration:
mysql> select hostgroup_id,hostname,status from runtime_mySQL_servers; +--------------+----------+--------+ | hostgroup_id | hostname | status | +--------------+----------+--------+ | 2 | gr1 | ONLINE | | 3 | gr3 | ONLINE | | 3 | gr2 | ONLINE | +--------------+----------+--------+ 3 rows in set (0.00 sec)
— Now, I am going to put the server “gr3” into maintenance mode on both ProxySQL.
After putting it into maintenance mode, both ProxySQL has the following output.
mysql> update mysql_servers set status='offline_soft' where hostname='gr3'; load mysql servers to runtime; save mysql servers to disk; Query OK, 1 row affected (0.00 sec) Query OK, 0 rows affected (0.01 sec) Query OK, 0 rows affected (0.04 sec) mysql> select hostgroup_id,hostname,status from runtime_mySQL_servers; +--------------+----------+--------------+ | hostgroup_id | hostname | status | +--------------+----------+--------------+ | 2 | gr1 | ONLINE | | 3 | gr3 | OFFLINE_SOFT | | 3 | gr2 | ONLINE | +--------------+----------+--------------+ 3 rows in set (0.00 sec)
— Now, I am going to stop the group replication service on the “gr3”.
mysql> stop group_replication; Query OK, 0 rows affected (4.59 sec)
Let’s check the ProxySQL status now.
At ProxySQL 2.3.0:
mysql> select hostgroup_id,hostname,status from runtime_mySQL_servers; +--------------+----------+--------------+ | hostgroup_id | hostname | status | +--------------+----------+--------------+ | 2 | gr1 | ONLINE | | 3 | gr3 | OFFLINE_SOFT | | 3 | gr2 | ONLINE | +--------------+----------+--------------+ 3 rows in set (0.00 sec)
The latest release still maintains the same status. “gr3” is still in maintenance mode.
At “< ProxySQL 2.3.0”:
mysql> select hostgroup_id,hostname,status from runtime_mySQL_servers; +--------------+----------+--------+ | hostgroup_id | hostname | status | +--------------+----------+--------+ | 2 | gr1 | ONLINE | | 4 | gr3 | ONLINE | | 3 | gr2 | ONLINE | +--------------+----------+--------+ 3 rows in set (0.00 sec)
The older ProxySQL release removed the “OFFLINE_SOFT” flag from “gr3” and put it on the offline hostgroup (hg 4).
— Now, I am again going to start the group_replication service on gr3.
mysql> start group_replication; Query OK, 0 rows affected (2.58 sec)
At ProxySQL 2.3.0:
mysql> select hostgroup_id,hostname,status from runtime_mySQL_servers; +--------------+----------+--------------+ | hostgroup_id | hostname | status | +--------------+----------+--------------+ | 2 | gr1 | ONLINE | | 3 | gr3 | OFFLINE_SOFT | | 3 | gr2 | ONLINE | +--------------+----------+--------------+ 3 rows in set (0.00 sec)
The latest release still maintains the same state.
At “< ProxySQL 2.3.0”:
mysql> select hostgroup_id,hostname,status from runtime_mySQL_servers; +--------------+----------+--------+ | hostgroup_id | hostname | status | +--------------+----------+--------+ | 2 | gr1 | ONLINE | | 3 | gr3 | ONLINE | | 3 | gr2 | ONLINE | +--------------+----------+--------+ 3 rows in set (0.00 sec)
At the older release, the server “gr3” came to ONLINE automatically. This is not the expected one because we did manually put that node into maintenance mode.
As you see in the comparison for the latest and older releases, the latest release has the right implementation. To remove the maintenance, we have to manually update the status to “ONLINE” as shown below.
mysql> update mysql_servers set status='online' where hostname='gr3'; load mysql servers to runtime; save mysql servers to disk; Query OK, 1 row affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.04 sec) mysql> select hostgroup_id,hostname,status from runtime_mySQL_servers; +--------------+----------+--------+ | hostgroup_id | hostname | status | +--------------+----------+--------+ | 2 | gr1 | ONLINE | | 3 | gr3 | ONLINE | | 3 | gr2 | ONLINE | +--------------+----------+--------+ 3 rows in set (0.00 sec)
I believe these two new implementations are very helpful to those who are running with the GR + ProxySQL setup. Apart from GR, the recent major releases have other important features as well. I will try to write a blog about them in the future, be on the lookout for that.
Complete the 2021 Percona Open Source Data Management Software Survey