Quantcast
Channel: Percona Database Performance Blog
Viewing all articles
Browse latest Browse all 1786

myloader Stops Causing Data Fragmentation

$
0
0
myloader Stops Causing Data Fragmentation

myloader Stops Causing Data FragmentationDuring the development of the myloader –innodb-optimize-keys option, which was released in version 0.10.7, we found several issues and opportunities to improve the process. We had to change the approach, reimplement some of the core functionality and add a couple of data structures. That allowed us to implement, at a really low cost, a feature that executes the files that contain INSERT statements, sorted by Primary Key. This is desirable to reduce page splits, which cause on-disk tablespace fragmentation.

In this blog post, I will present the differences in data fragmentation for each version.

Test Details

These are local vm tests as there is no intention to show performance gain.

The table that I used is:

CREATE TABLE `perf_test` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `val` varchar(108) DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `val` (`val`(2)),
 KEY `val_2` (`val`(4)),
 KEY `val_3` (`val`(8))
) ENGINE=InnoDB

And I inserted the data with:

INSERT INTO perf_test(val) SELECT concat(uuid(),uuid(),uuid()) FROM perf_test;

The graphs below were made with innodb_ruby (more info about it in this blog post) and based on a table of 131K rows with –rows 100. The intention of this test was to create a lot of files that will cause better spread in the Primary Key. The timings are over the same table structure but the table has 32M rows. Finally, I performed the test with 1 and 4 threads and with –innodb-optimize-keys when possible.

Tests Performed

In myloader v0.10.5 there was no file sorting, which is why we can see that lower Primary Key values were updated recently:

   

It doesn’t matter the number of threads, we can see how pages, across the whole file, are being updated at any time. 

This is happening because mydumper exported the files in order with these min_id and max_id values:

File min_id max_id
test.perf_test.00000.sql 1 21261
test.perf_test.00001.sql 21262 42522
test.perf_test.00002.sql 42523 49137
test.perf_test.00003.sql 65521 85044
test.perf_test.00004.sql 85045 98288
test.perf_test.00006.sql 131056 148827
test.perf_test.00007.sql 148828 170088
test.perf_test.00008.sql 170089 191349
test.perf_test.00009.sql 191350 196591
test.perf_test.00012.sql 262126 276393

But, during import, there was no order, let’s see the log:

** Message: 12:55:12.267: Thread 3 restoring `test`.`perf_test` part 1476. Progress 1 of 1589 .
** Message: 12:55:12.269: Thread 1 restoring `test`.`perf_test` part 87. Progress 2 of 1589 .
** Message: 12:55:12.269: Thread 2 restoring `test`.`perf_test` part 1484. Progress 3 of 1589 .
** Message: 12:55:12.269: Thread 4 restoring `test`.`perf_test` part 1067. Progress 4 of 1589 .
** Message: 12:55:13.127: Thread 1 restoring `test`.`perf_test` part 186. Progress 5 of 1589 .
** Message: 12:55:13.128: Thread 4 restoring `test`.`perf_test` part 1032. Progress 6 of 1589 .

With these max_id and max_id per file:

File min_id max_id
test.perf_test.01476.sql 31381237 31402497
test.perf_test.00087.sql 1849708 1870968
test.perf_test.01484.sql 31551325 31572585
test.perf_test.01067.sql 22685488 22706748
test.perf_test.00186.sql 3954547 3975807
test.perf_test.01032.sql 21941353 21962613

With this kind of insert order, you can only imagine the amount of page splits that cause the fragmentation in the InnoDB datafile.

Timings were:

0.10.5/mydumper/myloader  -t 1 6:52
0.10.5/mydumper/myloader  -t 4 4:55

In v0.10.7-2 we have the same behavior:

 

But we have a small performance increase:

0.10.7-2/mydumper/myloader  -t 1 6:49 
0.10.7-2/mydumper/myloader  -t 4 4:47

We see the same pattern, even if we use the –innodb-optimize-keys:

 

The main difference is the index creation stage.

0.10.7-2/mydumper/myloader --innodb-optimize-keys -t 1 6:07 
0.10.7-2/mydumper/myloader --innodb-optimize-keys -t 4 5:53

Now, in v0.10.9, where we have table and file sorting, the graphs have a significant change: 

 

It is also a bit shocking the difference between the 2 graphs, not about color trending, but about the number of pages used which indicates a high fragmentation when multiple threads are used.

master/mydumper/myloader  -t 1 5:50 
master/mydumper/myloader  -t 4 4:29

Let’s check now with –innodb-optimize-keys:

 

This is what we are looking for! As you can see with 1 thread is perfect, but with 4 threads there is some odd distribution, but for sure, much better than the other options.

However, the timings are not the best:

master/mydumper/myloader --innodb-optimize-keys -t 1 5:33 
master/mydumper/myloader --innodb-optimize-keys -t 4 5:10

Let’s compare them:

Data       | Index      | Total      | Table 
0 00:05:50 | 0 00:00:00 | 0 00:05:50 | `test`.`perf_test`  -t 1 
0 00:04:29 | 0 00:00:00 | 0 00:04:29 | `test`.`perf_test`  -t 4 
0 00:02:33 | 0 00:02:59 | 0 00:05:33 | `test`.`perf_test`  -t 1 --innodb-optimize-keys 
0 00:02:01 | 0 00:03:09 | 0 00:05:10 | `test`.`perf_test`  -t 4 --innodb-optimize-keys

But that makes sense if you read this blog post. Actually, it would be really nice to have a feature that determines when –innodb-optimize-keys needs to be used.

Conclusions

Version 0.10.9 of MyDumper will allow myloader to insert better than previous versions. Multithreaded inserts sorted by Primary Key are now possible and faster than ever!


Viewing all articles
Browse latest Browse all 1786

Trending Articles