
Percona XtraDB Cluster 8.0 comes with an upgraded Galera 4.0 library, which provides a new feature – streaming replication. Let’s review what it is and when it might be helpful.
Previous versions of Percona XtraDB Cluster with Galera 3.x had a limitation in how big transactions are handled.
Let’s review the performance under sysbench-tpcc workload when in parallel we update a big update on a table that is even non-related to the tables in the primary workload.
Without Streaming Replication
Let’s run two workloads.
- sysbench-tpcc workload with 1 sec resolution
- In parallel run UPDATE oltp.sbtest SET k=k+1 LIMIT 1000000
Running update:
mysql> update sbtest1 set k=k+1 limit 1000000; Query OK, 1000000 rows affected (34.48 sec) Rows matched: 1000000 Changed: 1000000 Warnings: 0
Check what is happening in sysbench-tpcc:
[ 77s ] thds: 100 tps: 7011.97 qps: 198248.21 (r/w/o: 90469.64/93758.63/14019.94) lat (ms,95%): 25.28 err/s 31.00 reconn/s: 0.00 [ 78s ] thds: 100 tps: 6779.94 qps: 196129.34 (r/w/o: 89462.24/93103.21/13563.88) lat (ms,95%): 26.20 err/s 30.00 reconn/s: 0.00 [ 79s ] thds: 100 tps: 6948.01 qps: 199157.35 (r/w/o: 90878.16/94383.16/13896.02) lat (ms,95%): 26.20 err/s 28.00 reconn/s: 0.00 [ 80s ] thds: 100 tps: 3920.09 qps: 113882.48 (r/w/o: 51940.13/54102.18/7840.17) lat (ms,95%): 27.17 err/s 15.00 reconn/s: 0.00 [ 81s ] thds: 100 tps: 67.00 qps: 1956.02 (r/w/o: 899.01/923.01/134.00) lat (ms,95%): 623.33 err/s 0.00 reconn/s: 0.00 [ 82s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 83s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 84s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 85s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 86s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 87s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 88s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 89s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 90s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 91s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 92s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 93s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 94s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 95s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 96s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 97s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 98s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 99s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 100s ] thds: 100 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 [ 101s ] thds: 100 tps: 3501.85 qps: 99695.66 (r/w/o: 45473.02/47218.94/7003.70) lat (ms,95%): 257.95 err/s 14.00 reconn/s: 0.00 [ 102s ] thds: 100 tps: 6980.06 qps: 197777.73 (r/w/o: 90228.79/93588.82/13960.12) lat (ms,95%): 25.74 err/s 33.00 reconn/s: 0.00 [ 103s ] thds: 100 tps: 6745.15 qps: 196518.25 (r/w/o: 89717.94/93310.02/13490.29) lat (ms,95%): 26.68 err/s 46.00 reconn/s: 0.00
The update by itself took *34 sec*.
With this, the main workload stopped for *22 sec*. Basically all queries will be stopped for this long.
With Streaming Replication
How can this be improved with streaming replication?
- Let’s enable streaming replication for the session when we will run the update:
SET SESSION wsrep_trx_fragment_unit='rows'; SET SESSION wsrep_trx_fragment_size=1000;
Basically, we say that the cluster should split the big transaction into chunks, 1000 rows each, and replicate in these smaller chunks. Other choices for unit beside ‘rows’ are ‘bytes’ or ‘statements’
And run the query:
mysql> update sbtest1 set k=k+1 limit 1000000; Query OK, 1000000 rows affected (39.76 sec) Rows matched: 1000000 Changed: 1000000 Warnings: 0
In sysbench-tpcc:
[ 81s ] thds: 100 tps: 6682.94 qps: 188552.70 (r/w/o: 85967.65/89221.16/13363.88) lat (ms,95%): 26.68 err/s 32.98 reconn/s: 0.00 [ 82s ] thds: 100 tps: 6700.92 qps: 192216.77 (r/w/o: 87715.23/91103.70/13397.84) lat (ms,95%): 27.17 err/s 27.01 reconn/s: 0.00 [ 83s ] thds: 100 tps: 3835.05 qps: 108387.43 (r/w/o: 49408.65/51302.68/7676.10) lat (ms,95%): 82.96 err/s 15.00 reconn/s: 0.00 [ 84s ] thds: 100 tps: 2210.13 qps: 63161.58 (r/w/o: 28852.64/29888.70/4420.25) lat (ms,95%): 95.81 err/s 9.00 reconn/s: 0.00 [ 85s ] thds: 100 tps: 2558.00 qps: 72592.08 (r/w/o: 33093.04/34383.04/5116.01) lat (ms,95%): 87.56 err/s 9.00 reconn/s: 0.00 [ 86s ] thds: 100 tps: 2617.99 qps: 75127.81 (r/w/o: 34299.91/35591.91/5235.99) lat (ms,95%): 78.60 err/s 9.00 reconn/s: 0.00 [ 87s ] thds: 100 tps: 2887.75 qps: 81760.97 (r/w/o: 37312.79/38672.68/5775.50) lat (ms,95%): 73.13 err/s 15.00 reconn/s: 0.00 [ 88s ] thds: 100 tps: 3024.00 qps: 84461.96 (r/w/o: 38606.98/39806.98/6048.00) lat (ms,95%): 69.29 err/s 15.00 reconn/s: 0.00 [ 89s ] thds: 100 tps: 3119.27 qps: 91128.99 (r/w/o: 41566.65/43323.80/6238.55) lat (ms,95%): 63.32 err/s 9.00 reconn/s: 0.00 [ 90s ] thds: 100 tps: 3385.74 qps: 98314.42 (r/w/o: 44883.54/46659.40/6771.48) lat (ms,95%): 56.84 err/s 14.00 reconn/s: 0.00 [ 91s ] thds: 100 tps: 3641.08 qps: 103916.20 (r/w/o: 47422.00/49212.04/7282.15) lat (ms,95%): 54.83 err/s 21.00 reconn/s: 0.00 [ 92s ] thds: 100 tps: 3850.12 qps: 106013.43 (r/w/o: 48296.56/50021.62/7695.25) lat (ms,95%): 57.87 err/s 23.00 reconn/s: 0.00 [ 93s ] thds: 100 tps: 3828.07 qps: 111682.90 (r/w/o: 51005.87/53015.90/7661.13) lat (ms,95%): 54.83 err/s 22.00 reconn/s: 0.00 [ 94s ] thds: 100 tps: 4358.95 qps: 122173.63 (r/w/o: 55746.37/57709.35/8717.90) lat (ms,95%): 42.61 err/s 14.00 reconn/s: 0.00 [ 95s ] thds: 100 tps: 4367.09 qps: 123297.63 (r/w/o: 56193.20/58370.24/8734.19) lat (ms,95%): 44.98 err/s 16.00 reconn/s: 0.00 [ 96s ] thds: 100 tps: 4272.92 qps: 118822.67 (r/w/o: 54076.94/56201.90/8543.83) lat (ms,95%): 46.63 err/s 24.00 reconn/s: 0.00 [ 97s ] thds: 100 tps: 4697.88 qps: 133071.68 (r/w/o: 60676.49/62997.43/9397.77) lat (ms,95%): 38.25 err/s 17.00 reconn/s: 0.00 [ 98s ] thds: 100 tps: 4742.21 qps: 135167.87 (r/w/o: 61693.68/63989.78/9484.41) lat (ms,95%): 37.56 err/s 21.00 reconn/s: 0.00 [ 99s ] thds: 100 tps: 4949.89 qps: 139343.00 (r/w/o: 63616.63/65826.58/9899.79) lat (ms,95%): 36.24 err/s 21.00 reconn/s: 0.00 [ 100s ] thds: 100 tps: 4766.10 qps: 139554.99 (r/w/o: 63695.37/66327.42/9532.20) lat (ms,95%): 36.89 err/s 18.00 reconn/s: 0.00 [ 101s ] thds: 100 tps: 5069.91 qps: 143318.44 (r/w/o: 65310.83/67867.79/10139.82) lat (ms,95%): 35.59 err/s 13.00 reconn/s: 0.00 [ 102s ] thds: 100 tps: 4947.06 qps: 140053.63 (r/w/o: 63820.74/66338.77/9894.12) lat (ms,95%): 36.24 err/s 23.00 reconn/s: 0.00 [ 103s ] thds: 100 tps: 5045.00 qps: 145397.93 (r/w/o: 66328.97/68978.97/10090.00) lat (ms,95%): 34.33 err/s 18.00 reconn/s: 0.00 [ 104s ] thds: 100 tps: 5139.02 qps: 141954.54 (r/w/o: 64723.25/66953.25/10278.04) lat (ms,95%): 36.24 err/s 28.00 reconn/s: 0.00 [ 105s ] thds: 100 tps: 5214.90 qps: 147582.10 (r/w/o: 67371.68/69780.63/10429.80) lat (ms,95%): 34.33 err/s 25.00 reconn/s: 0.00 [ 106s ] thds: 100 tps: 4924.08 qps: 139603.33 (r/w/o: 63673.06/66082.10/9848.16) lat (ms,95%): 36.24 err/s 23.00 reconn/s: 0.00 [ 107s ] thds: 100 tps: 5202.97 qps: 147199.09 (r/w/o: 67176.58/69616.57/10405.94) lat (ms,95%): 34.33 err/s 30.00 reconn/s: 0.00 [ 108s ] thds: 100 tps: 5219.91 qps: 147677.47 (r/w/o: 67416.84/69820.80/10439.82) lat (ms,95%): 33.72 err/s 28.00 reconn/s: 0.00 [ 109s ] thds: 100 tps: 5018.99 qps: 143211.61 (r/w/o: 65365.82/67808.81/10036.97) lat (ms,95%): 36.24 err/s 23.00 reconn/s: 0.00 [ 110s ] thds: 100 tps: 5070.16 qps: 142049.54 (r/w/o: 64817.07/67091.15/10141.32) lat (ms,95%): 34.95 err/s 17.00 reconn/s: 0.00 [ 111s ] thds: 100 tps: 4954.87 qps: 141476.26 (r/w/o: 64529.29/67037.23/9909.74) lat (ms,95%): 35.59 err/s 25.00 reconn/s: 0.00 [ 112s ] thds: 100 tps: 4827.12 qps: 140426.46 (r/w/o: 64103.58/66668.64/9654.24) lat (ms,95%): 35.59 err/s 19.00 reconn/s: 0.00 [ 113s ] thds: 100 tps: 5027.00 qps: 145229.08 (r/w/o: 66179.04/68996.04/10054.01) lat (ms,95%): 34.33 err/s 26.00 reconn/s: 0.00 [ 114s ] thds: 100 tps: 5099.87 qps: 144585.36 (r/w/o: 65976.34/68409.28/10199.74) lat (ms,95%): 34.33 err/s 26.00 reconn/s: 0.00 [ 115s ] thds: 100 tps: 5010.11 qps: 143316.08 (r/w/o: 65356.40/67939.46/10020.22) lat (ms,95%): 34.95 err/s 26.00 reconn/s: 0.00 [ 116s ] thds: 100 tps: 5056.00 qps: 143686.98 (r/w/o: 65621.99/67952.99/10112.00) lat (ms,95%): 34.95 err/s 31.00 reconn/s: 0.00 [ 117s ] thds: 100 tps: 4908.95 qps: 141669.49 (r/w/o: 64653.31/67198.28/9817.90) lat (ms,95%): 36.24 err/s 21.00 reconn/s: 0.00 [ 118s ] thds: 100 tps: 5039.07 qps: 142667.01 (r/w/o: 65056.92/67531.95/10078.14) lat (ms,95%): 34.33 err/s 24.00 reconn/s: 0.00 [ 119s ] thds: 100 tps: 5076.89 qps: 143205.79 (r/w/o: 65195.54/67856.48/10153.77) lat (ms,95%): 35.59 err/s 18.00 reconn/s: 0.00 [ 120s ] thds: 100 tps: 4909.09 qps: 137380.48 (r/w/o: 62539.13/65024.17/9817.18) lat (ms,95%): 34.95 err/s 13.00 reconn/s: 0.00 [ 121s ] thds: 100 tps: 5024.93 qps: 144610.91 (r/w/o: 66027.05/68533.01/10050.85) lat (ms,95%): 35.59 err/s 23.00 reconn/s: 0.00 [ 122s ] thds: 100 tps: 4874.10 qps: 138066.96 (r/w/o: 62942.35/65376.40/9748.21) lat (ms,95%): 35.59 err/s 16.00 reconn/s: 0.00 [ 123s ] thds: 100 tps: 6745.83 qps: 187288.34 (r/w/o: 85354.88/88441.80/13491.66) lat (ms,95%): 28.67 err/s 27.00 reconn/s: 0.00 [ 124s ] thds: 100 tps: 6132.03 qps: 172854.73 (r/w/o: 78867.33/81723.34/12264.05) lat (ms,95%): 29.19 err/s 18.00 reconn/s: 0.00 [ 125s ] thds: 100 tps: 6114.99 qps: 175777.68 (r/w/o: 80098.85/83448.85/12229.98) lat (ms,95%): 30.26 err/s 39.00 reconn/s: 0.00 [ 126s ] thds: 100 tps: 6206.87 qps: 179830.22 (r/w/o: 82043.28/85374.21/12412.74) lat (ms,95%): 29.72 err/s 29.00 reconn/s: 0.00 [ 127s ] thds: 100 tps: 6441.25 qps: 181759.03 (r/w/o: 82799.20/86076.33/12883.50) lat (ms,95%): 28.67 err/s 28.00 reconn/s: 0.00 [ 128s ] thds: 100 tps: 5925.87 qps: 169978.30 (r/w/o: 77565.31/80561.24/11851.74) lat (ms,95%): 30.81 err/s 32.00 reconn/s: 0.00 [ 129s ] thds: 100 tps: 6614.92 qps: 186834.71 (r/w/o: 85216.96/88388.92/13228.84) lat (ms,95%): 27.66 err/s 24.00 reconn/s: 0.00
So what happened here:
The update query took a little longer (39 sec instead of 34 sec). The main workload also took some hit (a decline from 6700 tps to 2210 tps at the worst period), but it did not stop completely, which is a huge improvement.
Why should we not enable streaming by default for all transactions? The reason is it may negatively impact regular small transactions, so it is advisable to use streaming replication only for big or long-running transactions.