
Storing your data locally can impose security and availability risks. Major cloud providers have object storage services available to allow you to upload and distribute data across different regions using various retention and restore policies.
Percona XtraBackup delivers the xbcloud binary – an auxiliary tool to allow users to upload backups to different cloud providers directly.
Today we are glad to announce the introduction of the Exponential Backoff feature to xbcloud.
In short, this new feature will allow your backup upload/download to work better with unstable network connections by retrying each chunk and adding an exponential wait time in between retries, increasing the chances of completion in case of an unstable connection or network glitch.
This new functionality is available on today’s release of Percona XtraBackup 8.0.26 and will be available in Percona XtraBackup 2.4.24.
How it Works – in General
Whenever one chunk upload or download fails to complete its operation, xbcloud will check the reason for the failure. It can be either a CURL / HTTP or a client-specific error. If the error is listed as retriable (more about that later in this post), xbcloud will backoff/sleep for a certain amount of time before trying again. It will retry the same chunk 10 times before aborting the whole process. 10 is the default retry amount and can be configured via --max-retries
parameter.
How it Works – Backoff Algorithm
Network glitches/instabilities usually happen for a short period of time. To make xbcloud tool more reliable and increase the chances of a backup upload/download to complete during those instabilities, we pause for a certain period of time before retrying the same chunk. The algorithm chosen is known as exponential backoff.
In the case of a retry, we calculate the power of two using the number of retries we already did for that specific chunk as the exponential factor. Since xbcloud does multiple asynchronous requests in parallel, we factor in a random number of milliseconds between 1 and 1000 to each chunk. This is to avoid all asynchronous request backoff for the same amount of time and retry all at once, which could cause network congestion.
The backoff time will keep increasing as the same chunk keeps failing to upload/download. Getting by example the default --max-retry
of 10, that would mean the last backoff will be around 17 minutes.
To overcome this, we have implemented the --max-backoff
parameter. This parameter defines the maximum time the program can sleep in milliseconds between chunk retries – Default to 300000 (5 minutes).
How it Works – Retriable Errors
We have a set of errors that we know we should retry the operations. For CURL, we retry on:
CURLE_GOT_NOTHING CURLE_OPERATION_TIMEDOUT CURLE_RECV_ERROR CURLE_SEND_ERROR CURLE_SEND_FAIL_REWIND CURLE_PARTIAL_FILE CURLE_SSL_CONNECT_ERROR
For HTTP, we retry the operation in case of the following status codes:
503 500 504 408
Each cloud provider might return a different CURL or HTTP error depending on the issue. To allow users to extend this list and not rely on us providing a new version of xbcloud, we created a mechanism to allow users to extend this list.
One can add new errors by setting --curl-retriable-errors
/ --http-retriable-errors
respectively.
On top of that, we have enhanced the error handling when using --verbose
output to specify in which error xbcloud failed and what parameter a user will have to add to retry on this error. Here is one example:
210701 14:34:23 /work/pxb/ins/8.0/bin/xbcloud: Operation failed. Error: Server returned nothing (no headers, no data)
210701 14:34:23 /work/pxb/ins/8.0/bin/xbcloud: Curl error (52) Server returned nothing (no headers, no data) is not configured as retriable. You can allow it by adding --curl-retriable-errors=52
parameter
Those options accept a comma list of error codes.
Example
Below is one example of xbcloud exponential backoff in practice used with --max-retries=5 --max-backoff=10000
:
210702 10:07:05 /work/pxb/ins/8.0/bin/xbcloud: Operation failed. Error: Server returned nothing (no headers, no data) 210702 10:07:05 /work/pxb/ins/8.0/bin/xbcloud: Sleeping for 2384 ms before retrying backup3/xtrabackup_logfile.00000000000000000006 [1] . . . 210702 10:07:23 /work/pxb/ins/8.0/bin/xbcloud: Operation failed. Error: Server returned nothing (no headers, no data) 210702 10:07:23 /work/pxb/ins/8.0/bin/xbcloud: Sleeping for 4387 ms before retrying backup3/xtrabackup_logfile.00000000000000000006 [2] . . . 210702 10:07:52 /work/pxb/ins/8.0/bin/xbcloud: Operation failed. Error: Failed sending data to the peer 210702 10:07:52 /work/pxb/ins/8.0/bin/xbcloud: Sleeping for 8691 ms before retrying backup3/xtrabackup_logfile.00000000000000000006 [3] . . . 210702 10:08:47 /work/pxb/ins/8.0/bin/xbcloud: Operation failed. Error: Failed sending data to the peer 210702 10:08:47 /work/pxb/ins/8.0/bin/xbcloud: Sleeping for 10000 ms before retrying backup3/xtrabackup_logfile.00000000000000000006 [4] . . . 210702 10:10:12 /work/pxb/ins/8.0/bin/xbcloud: successfully uploaded chunk: backup3/xtrabackup_logfile.00000000000000000006, size: 8388660
Let’s analyze the snippet log above:
- Chunk xtrabackup_logfile.00000000000000000006 failed to upload by the first time (as seen in the [1] above) and slept for 2384 milliseconds.
- Then the same chunk filed by the second time (as seen by the number within [] ) exponentially increasing the sleep time by 2
- When the chunk failed by the third time, we continued exponentially increasing the sleep time to around 8 seconds
- On the fourth time, we would originally increase the exponential time to around 16 seconds; however, we have used
--max-backoff=10000
which means that is the maximum sleep time between retries, resulting in the program waiting 10 seconds before trying the same chunk again. - Then we can see that in the end, it successfully uploaded the chunk xtrabackup_logfile.00000000000000000006
Summary
Best practices recommend distributing your backups to different locations. Cloud providers have dedicated services for this purpose. Using xbcloud alongside Percona XtraBackup are the tools to ensure you meet this requirement when talking about MySQL backup. On the other hand, we know that network connectivity can be unstable at the worst times. The new version of xbcloud won’t stop you from completing your backups as it will be more resilient to those instabilities with a variety of options to tune the network transfer.
Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!