
pt-online-schema-change emulates the way that MySQL alters tables internally, but it works on a copy of the table you wish to alter. It executes INSERT statements to import the data, that runs in a single connection to fill the new table. In this repository, there is a script called myloader_pt-osc.sh that uses myloader to execute parallel INSERTs, instead of using pt-online-schema-change internal tooling, but it also uses a patched version of pt-online-schema-change.
Patch pt-online-schema-change
There are 2 changes that we needed to perform on pt-online-schema-change:
1- Be able to stop the execution after the copy of the table is created
2- Tell pt-online-schema-change that no data needs to be imported
We are adding the –no-data to tell pt-online-schema-change to do not import the data to the new table. pt-online-schema-change is able to stop the execution with –pause-file option. The patch file includes the code to stop the execution when –no-data and –pause-file is used.
Procedure
The script is performing several tasks, the main ones are:
- Creating a backup directory in a format that myloader is able to use.
- Starting the pt-online-schema-change which is going to stop after creating the new table and triggers.
- Start the myloader execution.
- Resume pt-online-schema-change, after myloader finishes, to swap the tables.
Take into consideration that the script is not designed to support multi-column nor non-integer primary keys. Do not use it in production environments without testing it thoroughly.
Timings
A sysbench table with 10M rows was created to perform an alter table with “engine innodb” which is like an optimize table, I used these two executions to compare:
~/git/myloader_pt-osc/myloader_pt-osc.sh -D sbtest -T sbtest1 --alter 'engine innodb' --chunk-size 10000 -t 10
pt-online-schema-change --alter 'engine innodb' --execute D=sbtest,t=sbtest1 --chunk-size 10000
The chunk-size and the number of threads were variables to create these graphs:
In the next graph is the improvement in percentage:
Use Cases
I consider that this method or strategy should be used on specific scenarios, as there are considerations when not to use it, for instance:
- The pt-online-schema-change throttle mechanism will not slow down the insert, slaves might get behind.
- As the intention is to use multiple threads, the server will receive more load and no mechanism in place to stop it.
Conclusion and Expectation
Basically, it is a pt-online-schema-change wrapper that uses myloader to perform the INSERTs in parallel which makes a faster alter table. It might be used when times need to be reduced and the load is not an issue.
I would like to see this implemented inside pt-online-schema-change as a possible strategy but until this happens, this might be an alternative.