Masquerade Your Backups To Build QA/Testing Environments With MyDumper

For a long time, MyDumper has been the fastest tool to take Logical Backups. We have been adding several features to expand the use cases. Masquerade was one of these features, but it was only for integer and UUID values. In this blog post, I’m going to present a new functionality that is available in MyDumper and will be available in the next release: we added the possibility to build random data based on a format that the user defines.

How does it work?

During export, mydumper sends SELECT statements to the database. Each row is written one by one as an INSERT statement. Something important that you might not know, is that each column of a row can be transformed by a function. When you execute a backup, the default function is the identity function, as nothing needs to be changed. The function, which can be configured inside the defaults file, will change the content of the column before writing the row into disk.

How can we select the column to masquerade?

I think that the most valuable element of this feature is the simplicity to define which column will be modified and how you want to mask it. The format is:

[`schema_name`.`table_name`]
`column1`=random_int
`column2`=random_string

In the section name, you add the schema and table name surrounded by backticks and separated by a dot. Then, each key-value entry will keep in the key the column name surrounded by backticks, and the value will be the masking function definition.

New random format function

Having string, integer, and UUID is nice to have, but what about build dynamic data with a specific format? As we want more realistic data, we want to build dynamically world wide addresses, phone numbers, emails, etc. The new function has this syntax:

random_format { <{file|string n|number n}> | DELIMITER | 'CONSTANT' }*

This are some examples:

`phone`=random_format '+1 ('<number 3>') '<number 3>'-'<number 4>
`emails`=random_format <file names.txt>'.'<file surnames.txt>'@'<file domains.txt>
`addresses`=random_format <number 3>' '<file streets.txt>', '<file cities.txt>', '<file states_and_zip.txt>', USA'

Performance considerations

You should expect performance degradation if you compare masquerade backups and regular backups. It is impossible to measure the impact as it will depend on the amount of data that needs to be masked. However, I tried to give you an idea through an example over a sysbench table of 10M rows.

Baseline backup

We are going to split by rows and compress with ZSTD:

# rm -rf data/; time ./mydumper -o data -B test --defaults-file=mydumper.cnf -r 100000 -c
real 0m19.964s
user 0m48.396s
sys 0m7.885s

It took near 19.9 seconds to complete, and here is an example of the output:

# zstdcat data/test.sbtest1.00000.sql.zst | grep INSERT -A10 | head
INSERT INTO `sbtest1` VALUES(1,4992833,"83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330","67847967377-48000963322-62604785301-91415491898-96926520291")
,(2,5019684,"38014276128-25250245652-62722561801-27818678124-24890218270-18312424692-92565570600-36243745486-21199862476-38576014630","23183251411-36241541236-31706421314-92007079971-60663066966")

One integer column

We are going to use random_int over the k column, which in the configuration will be:

[`test`.`sbtest1`]
`k`=random_int

The backup took 20.7 seconds, an increase of 4%:

# rm -rf data/; time ./mydumper -o data -B test --defaults-file=mydumper-k.cnf -r 100000 -c
real 0m20.709s
user 0m46.056s
sys 0m11.247s

And as you can see, the data in the second column has changed:

# zstdcat data/test.sbtest1.00000.sql.zst | grep INSERT -A10 | head
INSERT INTO `sbtest1` VALUES(1,1527173,"83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330","67847967377-48000963322-62604785301-91415491898-96926520291")
,(2,3875126,"38014276128-25250245652-62722561801-27818678124-24890218270-18312424692-92565570600-36243745486-21199862476-38576014630","23183251411-36241541236-31706421314-92007079971-60663066966")

random_format with <number 11>

Now, we are going to use the last column (pad) and the number tag with 11 digits to simulate the values:

`pad`=random_format <number 11>-<number 11>-<number 11>-<number 11>-<number 11>

We can see that it took 36.6 seconds to complete, and the values in the latest column have changed:

# rm -rf data/; time ./mydumper -o data -B test --defaults-file=mydumper-pad-long.cnf -r 100000 -c
real 0m36.667s
user 1m3.785s
sys 0m32.757s
# zstdcat data/test.sbtest1.00000.sql.zst | grep INSERT -A10 | head
INSERT INTO `sbtest1` VALUES(1,4992833,"83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330","32720009027-12540600353-41008809903-18811191622-46944507919")
,(2,5019684,"38014276128-25250245652-62722561801-27818678124-24890218270-18312424692-92565570600-36243745486-21199862476-38576014630","14761241271-79422723442-42242331639-12424460062-25625932261")

Take into consideration that 11 digits forced us to execute two times g_random_int, this means that if we have:

`pad`=random_format <number 9>-<number 9>-<number 9>-<number 9>-<number 9>

It will take 29 seconds.

random_format with <file> with 100 lines file

In this case, the configuration will be:

`pad`=random_format <file words_alpha.txt.100>-<file words_alpha.txt.100>-<file words_alpha.txt.100>-<file words_alpha.txt.100>-<file words_alpha.txt.100>

And it will take 34 seconds:

# rm -rf data/; time ./mydumper -o data -B test --defaults-file=mydumper-simple-pad.cnf -r 100000 -c
real 0m34.224s
user 0m56.702s
sys 0m29.474s
# zstdcat data/test.sbtest1.00000.sql.zst | grep INSERT -A10 | head
INSERT INTO `sbtest1` VALUES(1,4992833,"83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330","aam-abacot-abalienated-abandonedly-ab")
,(2,5019684,"38014276128-25250245652-62722561801-27818678124-24890218270-18312424692-92565570600-36243745486-21199862476-38576014630","aardwolves-abaised-abandoners-aaronitic-abacterial")

Warning

This is not a fully tested feature in MyDumper; you should consider it as Beta. However, I found it relevant to show the potential that it might have for the community.

Conclusion

Never has it been as easy to build a new masquerade environment as we can do now with MyDumper.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Try Percona Distribution for MySQL today!

Masquerade Your Backups To Build QA/Testing Environments With MyDumper

How does it work?

How can we select the column to masquerade?

New random format function

Performance considerations

Baseline backup

One integer column

random_format with <number 11>

random_format with <file> with 100 lines file

Warning

Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112