Let Percona Actively Manage Your Databases To Achieve Peak Performance

November 13, 2019, 8:05 am

≫ Next: Blog Poll: Who Chooses Your Database Technology?

≪ Previous: Watch Out for Disk I/O Performance Issues when Running EXT4

Percona Managed Database Services Data drives every aspect of your business so your databases need to deliver optimum performance and availability to keep you competitive.

A business requires 24x7x365 database coverage. Keeping your databases stable, performant, and optimized is crucial to your business success.

We understand that finding and retaining qualified DBAs to manage mission-critical database environments can be difficult — Percona is ideally placed to help you meet this challenge. Our Managed Services team is on hand to provide database support at a fraction of the cost of a dedicated, in-house, full-time DBA.

How Percona Managed Services Can Help Your Business

Percona has introduced two new Managed Services options to help you feel confident that your database is performing at the highest level.

Percona Managed Database Services (PMDS) is a flexible, managed database service that delivers exceptional, enterprise-grade expertise across a variety of high-performance enterprise database environments.

PMDS gives you affordable in-depth technical expertise on demand. Our experts proactively monitor your databases around the clock, keeping them running at an optimum level, whether on-premises or in the cloud. This allows us to reduce critical incidents and meet database performance goals, allowing your engineering team to focus on your core business.

The Key Benefits of PMDS:

Proactive database monitoring and alert/response mean your database has 24x7x365 oversight.
With our robust change management system, you can feel confident any required database modifications are industry best practices.
When problems do occur, incident management and root cause analysis (RCA) services are in place to identify and solve issues quickly.
Inclusive allocated DBA hours allow you to implement Percona’s recommendations without delay.
Your dedicated service delivery manager (SDM) holds monthly calls and produces a report card.
Automated reports provide you with monthly security assessments and a weekly health check.

Please visit our webpage to find out more and to download our latest datasheet.

To accompany PMDS we also offer Percona Advanced Managed Database Service (PAMDS). PAMDS is an enhanced Percona service that offers your business additional, specific and in-depth database reporting. This service is ideal for businesses with volatile, complex, or rapidly-expanding database environments.

You can add PAMDS at any point in your engagement with Percona. Please visit our webpage to find out more and to download our latest datasheet.

Percona Monitoring and Management

As part of our managed services offerings, we also utilize Percona Monitoring and Management (PMM). PMM is an award-winning, free, open source platform for managing and monitoring the performance of your database environment.

PMM allows you to visualize query performance for MySQL, PostgreSQL, and MongoDB environments and is used by 1,000’s of organizations worldwide. With PMM you can monitor multiple databases, multiple technologies, and data from multiple providers easily and quickly, regardless of location.

Contact us

Percona Managed Services provides tailored managed database services for your business and helps you lower costs and manage your database complexity.

Our Managed Services team provides deep operational knowledge of MySQL, MongoDB, MariaDB, Amazon AWS, Amazon Aurora and Amazon RDS, Microsoft Azure, and Google Cloud Platform to ensure your database performs at the highest level.

For more information on how Percona Managed Services can help your business, please contact us at +1-888-316-9775 (USA), +44 203 608 6727 (Europe), or have us reach out to you.

↧

Blog Poll: Who Chooses Your Database Technology?

November 14, 2019, 9:54 am

≫ Next: Don’t Use MongoDB Profiling Level 1

≪ Previous: Let Percona Actively Manage Your Databases To Achieve Peak Performance

Who Chooses Database Tech? It’s that time again – a new blog poll! This time around, we’re interested in hearing who chooses your database technology. Is it DBAs? Management? Here’s the question: Who Chooses the Database Technology for a New Application in Your Organization?

Last year, we asked you a few questions in a blog poll and we received a great amount of feedback. We wanted to follow up on those some of those same survey questions to see what may have changed. We’d love to hear from you!

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

This poll will be up for one month and will be maintained over in the sidebar should you wish to come back at a later date and take part. We look forward to seeing your responses!

↧

Don’t Use MongoDB Profiling Level 1

November 15, 2019, 6:33 am

≫ Next: PMM 2.1, MongoDB Hot Backups, Percona Server Updates: Release Roundup 11/18/2019

≪ Previous: Blog Poll: Who Chooses Your Database Technology?

TLDR: It is not profile level 1 that is the problem; it’s a gotcha with the optional ‘slowms’ argument that causes users to accidentally set verbose logging and fill their disk with log files.

In MongoDB, there are two ways to see, with individual detail, which operations were executed and how long they took.

Profiling. Saves the operation details to a capped collection system.profile. You access this information through a mongodb connection.
Log lines of “COMMAND” component type in the mongod log files. (Also mongos v4.0+ log files). You have to access the files as Unix or Windows user and work with them as text.

Profiling is low-cost, but it is limited to keeping only a small snapshot. It has levels: 0 (off), 1 (slowOp(s only)), 2 (all).

The (log file) logger also has levels of its own, but there is no ‘off’. Even at level 0, it prints any slow operation in a “COMMAND” log line. ‘Slow’ is defined by the configured slowOpThresholdMs option (originally “slowMS”). That is 100ms by default, which is a good default i.m.o.

Usability problem

The log-or-not code and profile-or-not are unrelated systems to the user, but they share the same global variable for ‘slow’: serverGlobalParams.slowMS

The basic post-mortem will go:

Someone used db.setProfilingLevel(1/*slowOp*/, 0/*ms*/) to start profiling all operations (DON’T DO THIS – use db.setProfilingLevel(2/*all*/) instead.)
The logger starts writing every command to the log files.
They executed db.setProfilingLevel(0/*off*/) to turn the profiler off.
The logger continues writing every command to the log files because slowMS is still 0ms.
The DBA gets paged after hours because the disk filled up with log file and thought ‘oh, I should have set up log rotation on that server; I’ll do it in the morning’.
The DBA gets woken up in the small hours of the morning because the new primary node has also crashed due to a full disk.

So that’s the advice: Until MongoDB is enhanced to have separate slow-op threshold options for the profiler, never use db.setProfilingLevel(1, …). Even if you know the gotcha, someone learning over your shoulder won’t see it.

What to do instead:

Use only db.setProfilingLevel(0 /*off*/) <-> db.setProfilingLevel(2 /*all*/) and don’t touch slowms
- There is still a valid use case for using profiler at level 1, but if you are not taking on the responsibility of looking after the log file’s code interdependence on the slowMS value, don’t go there.
If you want the logger to print “COMMAND” lines for every command, including fast ones, use db.setLogLevel(1, "command"). And run db.setLogLevel(-1, "command") to stop it again. ‘Use log level 1, not profile level 1’ could almost be the catchphrase of this article.
(db.setLogLevel(1) + db.setLogLevel(0) is an alternative to the above, but is an older, blunter method.)
If you want to set the sampleRate you can do that without changing level (or ‘slowms’) with the following command: db.setProfilingLevel(db.getProfilingStatus().was, {“sampleRate”: <new float value>})
If you want a different threshold for slowMS permanently, use the slowOpThresholdMs option in the config file, but you can also do it dynamically as for the sampleRate instructions above.

Some other gotchas to do with profiling and logging:

The logger level is global; the slowOpThresholdMs a.k.a. slowMS value is global, but profiling level is per db namespace.
All of the above are local only to the mongod (or mongos) the commands are run on / the config file is set for. If you run it on a primary it does not change the secondaries, and if you run it on a mongos it does not change the shard or the configsvr replicaset nodes.
mongos nodes only provide a subset of these diagnostic features. They have no collections of their own, so for starters, they cannot make a system.profile collection.

bool shouldDBProfile(bool shouldSample = true) {
    // Profile level 2 should override any sample rate or slowms settings.
    if (_dbprofile >= 2)
        return true;

    if (!shouldSample || _dbprofile <= 0)
        return false;

      /* Blog: and by elimination if _dbprofile == 1: */
    return elapsedTimeExcludingPauses() >= Milliseconds{serverGlobalParams.slowMS};
}

bool CurOp::completeAndLogOperation(OperationContext* opCtx,
                                    logger::LogComponent component,
                                    boost::optional<size_t> responseLength,
                                    boost::optional<long long> slowMsOverride,
                                    bool forceLog) {
    // Log the operation if it is eligible according to the current slowMS and sampleRate settings.
    const bool shouldLogOp = (forceLog || shouldLog(component, logger::LogSeverity::Debug(1)));
    const long long slowMs = slowMsOverride.value_or(serverGlobalParams.slowMS);
    ...
    ...
    const bool shouldSample =
        client->getPrng().nextCanonicalDouble() < serverGlobalParams.sampleRate;

    if (shouldLogOp || (shouldSample && _debug.executionTimeMicros > slowMs * 1000LL)) {
        auto lockerInfo = opCtx->lockState()->getLockerInfo(_lockStatsBase);

    ...
    // Return 'true' if this operation should also be added to the profiler.
    return shouldDBProfile(shouldSample);
}

↧

PMM 2.1, MongoDB Hot Backups, Percona Server Updates: Release Roundup 11/18/2019

November 18, 2019, 8:30 am

≫ Next: Installing MySQL with Docker

≪ Previous: Don’t Use MongoDB Profiling Level 1

It’s release roundup time here at Percona!

As mentioned a few weeks ago, we are now publishing release roundups comprising all the details and information you need on the previous week (or two)’s releases from Percona. This post will encompass releases from November 4, 2019 – November 18, 2019.

Each roundup will showcase the latest in software updates, tools, and features to help you manage and deploy our software, with highlights and critical information, as well as links to the full release notes and direct links to the software or service itself.

In this edition, we highlight two recent version updates to Percona Server for MySQL, improvements and new features in Percona Monitoring and Management 2.1.0, and some very cool new functions in Percona Server for MongoDB 4.2.1-1, including streaming hot backups in all our active MongoDB releases.

Percona Server for MySQL 5.6.46-86.2

On November 6, 2019, we released Percona Server for MySQL version 5.6.46-86.2, the current GA release in the 5.6 series. It includes several bug fixes, including a fix of the Audit log filtering by a user not working and the addition of a package version for the Red Hat Package Manager (rpm). Percona Server for MySQL is an enhanced drop-in replacement for MySQL.

Download Percona Server for MySQL 5.6.46-86.2

Percona Monitoring and Management 2.1.0

PMM 2.1.0, a free and open-source platform for managing and monitoring MySQL, MongoDB, and PostgreSQL performance, was released on November 11, 2019. This version has bug fixes and many new features, including a latency detail graph, additional log and config files, and the disabling of heavy-load collectors automatically when there are too many tables.

NOTE: Percona Monitoring and Management (PMM) employs a client/server model. You must download and install both the client and server applications. The directions for doing this are in the documentation.

Download Percona Monitoring and Management 2.1.0

Percona Server for MongoDB 4.2.1-1

On November 13, 2019, we released Percona Server for MongoDB version 4.2.1-1, which includes all of the new features of the latest version of MongoDB 4.2 Community Edition, as well as the Percona Memory Engine storage engine, encrypted WiredTiger storage engine, and enhanced query profiling. Percona Server for MongoDB 4.2.1-1 adds the ability for remote streaming hot backups to Amazon S3 or compatible storage such as MinIO, and is now included in all our active MongoDB releases (3.6, 4.0, and 4.2).

Download Percona Server for MongoDB 4.2.1-1

Percona Server for MySQL 5.7.28-31

As of November 13, 2019, Percona Server for MySQL 5.7.28-31 is now the current GA (Generally Available) release in the 5.7 series. It is based on MySQL 5.7.27 and includes all the bug fixes in it. If you’re currently using Percona Server for MySQL 5.7, Percona recommends upgrading to this version of 5.7 prior to upgrading to Percona Server 8.0. Percona Server for MySQL is trusted by thousands of enterprises to provide better performance and concurrency for their most demanding workloads.

Download Server for MySQL 5.7.28-31

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training and software for MySQL, MariaDB, MongoDB, PostgreSQL, and other open source databases in on-premises and cloud environments.

↧

Installing MySQL with Docker

November 19, 2019, 6:05 am

≫ Next: Proposal for Global Indexes in PostgreSQL

≪ Previous: PMM 2.1, MongoDB Hot Backups, Percona Server Updates: Release Roundup 11/18/2019

Installing MySQL with Docker I often need to install a certain version of MySQL, MariaDB, or Percona Server for MySQL to run some experiments, whether to check for behavior differences or to provide tested instructions. In this blog series, I will look into how you can install MySQL, MariaDB, or Percona Server for MySQL with Docker. This post, part one, is focused on MySQL Server.

Docker is actually not my most preferred way as it does not match a typical production install, and if you look at service control behavior or file layout it is quite different. What is great about Docker though is that it allows installing the latest MySQL version – as well as any other version – very easily.

Docker also is easy to use when you need a simple, single instance. If you’re looking into some replication-related behaviors, DBDeployer may be a better tool for that.

These instructions are designed to get a test instance running quickly and easily; you do not want to use these for production deployments. All instructions below assume Docker is already installed.

First, you should know there are not one but two “official” MySQL Docker Repositories. One of them is maintained by the Docker Team and is available by a simple docker run mysql:latest. The other one is maintained by the MySQL Team at Oracle and would use a docker run mysql/mysql-server:latest syntax. In the examples below, we will use MySQL Team’s Docker images, though the Docker Team’s work in a similar way.

Installing the Latest MySQL Version with Docker

docker run --name mysql-latest  \
-p 3306:3306 -p 33060:33060  \
-e MYSQL_ROOT_HOST='%' -e MYSQL_ROOT_PASSWORD='strongpassword'   \
-d mysql/mysql-server:latest

This will start the latest version of MySQL instance, which can be remotely accessible from anywhere with specified root password. This is easy for testing, but not a good security practice (which is why it is not the default).

Connecting to MySQL Server Docker Container

Installing with Docker means you do not get any tools, utilities, or libraries available on your host directly, so you either install these separately, access created instance from a remote host, or use command lines shipped with docker image.

To Start MySQL Command Line Client with Docker Run:

docker exec -it mysql-latest mysql -uroot -pstrongpassword

To Start MySQL Shell with Docker Run:

docker exec -it mysql-latest mysqlsh -uroot -pstrongpassword

Managing MySQL Server in Docker Container

When you want to stop the MySQL Server Docker Container run:

docker stop mysql-latest

If you want to restart a stopped MySQL Docker container, you should not try to use docker run to start it again. Instead, you should use:

docker start mysql-latest

If something is not right, for example, if the container is not starting, you can access its logs using this command:

docker logs mysql-latest

If you want to re-create a fresh docker container from scratch you can run:

docker stop mysql-latest
docker rm mysql-latest

Followed by the

docker run

command described above.

Passing Command Line Options to MySQL Server in Docker Container

If you want to pass some command line options to MySQL Server, you can do it this way:

docker run --name mysql-latest  \
-p 3306:3306 -p 33060:33060  \
-e MYSQL_ROOT_HOST='%' -e MYSQL_ROOT_PASSWORD='strongpassword'   \
-d mysql/mysql-server:latest \
--innodb_buffer_pool_size=256M \
--innodb_flush_method=O_DIRECT \

Running Different MySQL Server Versions in Docker

If you just want to run one MySQL version at a time in Docker container, it is easy – you can just pick the version you want with Docker Image Tag and change the Name to be different in order to avoid name conflict:

docker run --name mysql-8.0.17  \
-p 3306:3306 -p 33060:33060  \
-e MYSQL_ROOT_HOST='%' -e MYSQL_ROOT_PASSWORD='strongpassword'   \
-d mysql/mysql-server:8.0.17

This will start MySQL 8.0.17 in Docker Container.

docker run --name mysql-5.7  \
-p 3306:3306 -p 33060:33060  \
-e MYSQL_ROOT_HOST='%' -e MYSQL_ROOT_PASSWORD='strongpassword'   \
-d mysql/mysql-server:5.7

And this will start the latest MySQL 5.7 in Docker.

Running Multiple MySQL Server Versions at the Same Time in Docker

The potential problem of running multiple MySQL Versions in Docker at the same time is TCP port conflict. If you do not access Docker Container from outside, and just run utilities included in the same container, you can just remove port mapping (-p option) and you can run multiple containers:

docker run --name mysql-latest  \
-e MYSQL_ROOT_HOST='%' -e MYSQL_ROOT_PASSWORD='strongpassword'   \
-d mysql/mysql-server:latest

docker run --name mysql-8.0.17  \
-e MYSQL_ROOT_HOST='%' -e MYSQL_ROOT_PASSWORD='strongpassword'   \
-d mysql/mysql-server:8.0.17

In more common cases when you need to access Docker containers externally, you will want to map them to use different external port names. For example, to start the latest MySQL 8 at ports 3306/33060 and MySQL 8.0.17 at 3307/33070, we can use:

docker run --name mysql-latest  \
-p 3306:3306 -p 33060:33060  \
-e MYSQL_ROOT_HOST='%' -e MYSQL_ROOT_PASSWORD='strongpassword'   \
-d mysql/mysql-server:latest


docker run --name mysql-8.0.17  \
-p 3307:3306 -p 33070:33060  \
-e MYSQL_ROOT_HOST='%' -e MYSQL_ROOT_PASSWORD='strongpassword'   \
-d mysql/mysql-server:8.0.17

There are a lot more things to consider if you’re going to use MySQL on Docker for anything beyond testing. For more information check-out the MySQL Server Page on Docker Hub and MySQL Manual.

↧

Proposal for Global Indexes in PostgreSQL

November 20, 2019, 7:26 am

≫ Next: Profiling Software Using perf and Flame Graphs

≪ Previous: Installing MySQL with Docker

PostgreSQL A global index, by very definition, is a single index on the parent table that maps to many underlying table partitions. The parent table itself does not have a single, unified underlying store so it must, therefore, retrieve the data satisfying index constraints from physically distributed tables. In very crude terms, the global index accumulates data in one place so that data spanning across multiple partitions are accessed in one go as opposed to individually querying each partition.

Currently, there is no Global Index implementation available in PostgreSQL, and therefore I want to propose a new feature. I have sent a proposal to the community, and that discussion is now started. In this proposal, I ask for Global Index support just for B-Tree and will consider other index methods later.

Terminologies used

Global Indexes

A one-to-many index, in which one index map to all the partitioned tables.

Partitioned Index (Index Partitioning)

When global indexes become too large, then those are partitioned to keep the performance and maintenance overhead manageable. These are not within the scope of this work.

Local Index

A local index is an index that is local to a specific table partition; i.e. it doesn’t span across multiple partitions. So, when we create an index on a parent table, it will create a separate index for all its partitions. PostgreSQL uses the terminology of “partitioned index” when it refers to local indexes. This work will fix this terminology for PostgreSQL so that the nomenclature remains consistent with other DBMS.

Why Do We Need Global Index in PostgreSQL?

A global index is expected to give two very important upgrades to the partitioning feature set in PostgreSQL. It is expected to give a significant improvement in read-performance for queries targeting multiple local indexes of partitions, as well as adding a unique constraint across partitions.

Unique Constraint

Data uniqueness is a critical requirement for building an index. For global indexes that span across multiple partitions, uniqueness will have to be enforced on index column(s). This effectively translates into a unique constraint.

Performance

Currently, the pseudo index created on the parent table of partitions does not contain any data. Rather, it dereferences to the local indexes when an index search is required. This means that multiple indexes will have to be evaluated with data to be combined thereafter. However, with the global indexes, data will reside with the global index declared on the parent table. This avoids the need for multi-level index lookups, so read performance is expected to be significantly higher in some cases. There will, however, be a negative performance impact during write (insert/update) of data. This is discussed in more detail later on.

Creating a Global Index – Syntax

A global index may be created with the addition of a “GLOBAL” keyword to the index statement. Alternatively, one could specify the “LOCAL” keyword to create local indexes on partitions. We are suggesting to call this set of keywords: “partition_index_type”. By default, partition_index_type will be set as LOCAL. Here is a sample of the create index syntax.

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name [ USING method ]

    ( { column_name | ( expression ) } [LOCAL | GLOBAL] [ COLLATE collation ] 
    [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )

    [ INCLUDE ( column_name [, ...] ) ]

    [ WITH ( storage_parameter = value [, ... ] ) ]

    [ TABLESPACE tablespace_name ]

    [ WHERE predicate ]

Pointing Index to Tuple

Currently, CTID carries a page and offset information for a known heap (table name). However, in the context of global indexes, this information within an index is insufficient. Since the index is expected to carry tuples from multiple partitions (heaps), CTID alone will not be able to link an index node to a tuple. This requires carrying additional data for the heap name to be stored with each index node.

Optimizer

The challenge with optimizer is a selection between local and global indexes when both are present. There have been many open questions, including evaluating the cost of scanning a global index. When should the LOCAL index be preferred over the GLOBAL index and vice versa?

Write Performance and Vacuum

There will be some write performance degradation because every change in partition tables must propagate upwards to the GLOBAL index on the parent table. This can be thought of as another index on a table, however, the [slight] performance degradation will be due to the fact that the GLOBAL index may carry a much bigger dataset with data from multiple partitions resulting in a higher tree traversal and update time. This applies to both write and vacuum processes.

It is still an open question, though, on how this will be handled within the code and how we can better optimize this process.

Conclusion

As we know most major DBMS engines that have partitioning support also have support for the Global Index. PostgreSQL has very powerful partitioning support but lacks the support of the Global Index. Global Index not only ensures the uniqueness across partitioning but also improves read performance. I have sent the proposal to PostgreSQL Community and while a discussion has been started, it is a slow process. If you are an engineer and want to contribute, respond to that thread in the community. If you are a user and have some uses cases, please share that on the same mail chain.

↧

Profiling Software Using perf and Flame Graphs

November 20, 2019, 10:47 am

≫ Next: Tips for Designing Grafana Dashboards

≪ Previous: Proposal for Global Indexes in PostgreSQL

Profiling Software Using perf and Flame Graphs

In this blog post, we will see how to use perf (a.k.a.: perf_events) together with Flame Graphs. They are used to generate a graphical representation of what functions are being called within our software of choice. Percona Server for MySQL is used here, but it can be extended to any software you can take a resolved stack trace from.

Before moving forward, a word of caution. As with any profiling tool, DON’T run this in production systems unless you know what you are doing.

Installing Packages Needed

For simplicity, I’ll use commands for CentOS 7, but things should be the same for Debian-based distros (apt-get install linux-tools-$(uname -r) instead of the yum command is the only difference in the steps).

To install perf, simply issue:

SHELL> sudo yum install -y perf

To get Flame Graphs project:

SHELL> mkdir -p ~/src
SHELL> cd ~/src
SHELL> git clone https://github.com/brendangregg/FlameGraph

That’s it! We are good to go.

Capturing Samples

Flame Graphs are a way of visualizing data, so we need to have some samples we can base off of. There are three ways in which we can do this. (Note that we will use the -p flag to only capture data from our process of interest, but we can potentially capture data from all the running processes if needed.)

1- Capture for a set amount of time only (ten seconds here):

SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld) -- sleep 10

2- Capture until we send the interrupt signal (CTRL-C):

SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld)

3- Capture for the whole lifetime of the process:

SHELL> sudo perf record -a -F 99 -g -- /sbin/mysqld \
--defaults-file=/etc/percona-server.conf.d/mysqld.cnf --user=mysql

SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld) -- mysql -e "SELECT * FROM db.table"

We are forced to capture data from all processes in the first case of the third variant since it’s impossible to know the process ID (PID) number beforehand (with the command executed, we are actually starting the MySQL service). This type of command comes in handy when you want to have data from the exact beginning of the process, which is not possible otherwise.

In the second variant, we are running a query on an already-running MySQL service, so we can use the -p flag to capture data on the server process. This is handy if you want to capture data at the exact moment a job is running, for instance.

Preparing the Samples

After the initial capture, we will need to make the collected data “readable”. This is needed because it is stored in binary format by perf record. For this we will use:

SHELL> sudo perf script > perf.script

It will read perf.data by default, which is the same default perf record uses for its output file. It can be overridden by using the -i flag and -o flag, respectively.

We will now be able to read the generated text file, as it will be in a human-readable form. However, when doing so, you will quickly realize why we need to aggregate all this data into a more intelligible form.

Generating the Flame Graphs

We can do the following in a one-liner, by piping the output of the first as input to the second. Since we didn’t add the FlameGraph git folder to our path, we will need to use full paths.

SHELL> ~/src/FlameGraph/stackcollapse-perf.pl perf.script | ~/src/FlameGraph/flamegraph.pl > flamegraph.svg

We can now open the .svg file in any browser and start analyzing the information-rich graphs.

How Does it Look?

As an example, I will leave full commands, their outputs, and a screenshot of a flame graph generated by the process using data capture method #2. We will run an INSERT INTO … SELECT query to the database, so we can then analyze its execution.

SHELL> time sudo perf record -a -F 99 -g \
-p $(pgrep -x mysqld) \
-- mysql test -e "INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;"
Warning:
PID/TID switch overriding SYSTEM
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 1.909 MB perf.data (8214 samples) ]

real 1m24.366s
user 0m0.133s
sys 0m0.378s

SHELL> sudo perf script | \ 
~/src/FlameGraph/stackcollapse-perf.pl perf.script | \
~/src/FlameGraph/flamegraph.pl > mysql_select_into_flamegraph.svg

The keen-eyed reader will notice we went one step further here and joined steps #2 and #3 via a pipe (|) to avoid writing to and reading from the perf.script output file. Additionally, there are time outputs so we can get an estimation on the amount of data the tool generates (~2Mb in 1min 25secs); this will, of course, vary depending on many factors, so take it with a pinch of salt, and test in your own environment.

The resulting flame graph is:

perf and Flame Graphs

One clear candidate for optimization is work around write_record: if we can make that function faster, there is a lot of potential for reducing overall execution time (squared in blue in the bottom left corner, we can see a total of ~60% of the samples were taken in this codepath). In the last section below we link to a blog post explaining more on how to interpret a Flame Graph, but for now, know you can mouse-over the function names and it will dynamically change the information shown at the bottom left corner. You may also visualize it better with the following guides in place:

flame graphs

Conclusion

For the Support team, we use this procedure in many cases where we need to have an in-depth view of what MySQL is executing, and for how long. This way, we can have a better insight into what operations are behind a specific workload and act accordingly. This procedure can be used either for optimizing or troubleshooting and is a very powerful tool in our tool belt! It’s known that humans are better at processing images rather than text, and this tool exploits that brilliantly, in my opinion.

Tips for Designing Grafana Dashboards

November 22, 2019, 6:28 am

≫ Next: UUIDs are Popular, but Bad for Performance — Let’s Discuss

≪ Previous: Profiling Software Using perf and Flame Graphs

As Grafana powers our star product – Percona Monitoring and Management (PMM) – we have developed a lot of experience creating Grafana Dashboards over the last few years. In this article, I will share some of the considerations for designing Grafana Dashboards. As usual, when it comes to questions of design they are quite subjective, and I do not expect you to chose to apply all of them to your dashboards, but I hope they will help you to think through your dashboard design better.

Design Practical Dashboards

Grafana features many panel types, and even more are available as plugins. It may be very attractive to use many of them in your dashboards using many different visualization options. Do not! Stick to a few data visualization patterns and only add additional visualizations when they provide additional practical value not because they are cool. Graph and Singlestat panel types probably cover 80% of use cases.

Do Not Place Too Many Graphs Side by Side

This probably will depend a lot on how your dashboards are used. If your dashboard is designed for large screens placed on the wall you may be able to fit more graphs side by side, if your dashboard needs to scale down to lower resolution small laptop screen I would suggest sticking to 2-3 graphs in a row.

Use Proper Units

Grafana allows you to specify a unit for the data type displayed. Use it! Without type set values will not be properly shortened and very hard to read:

Grafana Dashboards

Compare this to

Grafana Dashboards2

Mind Decimals

You can specify the number of values after decimal points you want to display or leave it default. I found default picking does not always work very well, for example here:

Grafana Dashboards3

For some reason on the panel Axis, we have way too many values displayed after the decimal point. Grafana also often picks three values after decimal points as in the table below which I find inconvenient – from the glance view, it is hard to understand if we’re dealing with a decimal point or with “,” as a “thousands” separator, so I may be looking at 2462 GiB there. While it is not feasible in this case, there are cases such as data rate where a 1000x value difference is quite possible. Instead, I prefer setting it to one decimal (or one if it is enough) which makes it clear that we’re not looking at thousands.

Label your Axis

You can label your axis (which especially makes sense) if the presentation is something not as obvious as in this example; we’re using a negative value to lot writes to a swap file.

Grafana Dashboards4

Use Shared Crosshair or Tooltip

In Dashboard Settings, you will find “Graph Tooltip” option and set it to “Default”,
“Shared Crosshair” or “Share Tooltip” This is how these will look:

Grafana Dashboards5

Grafana Dashboards 6

Shared crosshair shows the line matching the same time on all dashboards while Tooltip shows the tooltip value on all panels at the same time. You can pick what makes sense for you; my favorite is using the tooltip setting because it allows me to visually compare the same time without making the dashboard too slow and busy.

Note there is handy shortcut CTRL-O which allows you to cycle between these settings for any dashboard.

Pick Colors

If you’re displaying truly dynamic information you will likely have to rely on Grafana’s automatic color assignment, but if not, you can pick specific colors for all values being plotted. This will prevent colors from potentially being re-assigned to different values without you planning to do so.

Grafana Dashboards 7

Picking colors you also want to make sure you pick colors that make logical sense. For example, I think for free memory “green” is a better color than “red”. As you pick the colors, use the same colors for the same type of information when you show it on the different panels if possible, because it makes it easier to understand.

I would even suggest sticking to the same (or similar) color for the Same Kind of Data – if you have many panels which show disk Input and Output using similar colors, this can be a good idea.

Fill Stacking Graphs

Grafana does not require it, but I would suggest you use filling when you display stacking data and don’t use filling when you’re plotting multiple independent values. Take a look at these graphs:

In the first graph, I need to look at the actual value of the plotted value to understand what I’m looking at. At the same time, in the second graph, that value is meaningless and what is valuable is the filled amount. I can see on the second graph what amount of the Cache, blue value, has shrunk.

I prefer using a fill factor of 6+ so it is easier to match the fill colors with colors in the table. For the same reason, I prefer not to use the fill gradient on such graphs as it makes it much harder to see the color and the filled volume.

Do Not Abuse Double Axis

Graphs that use double axis are much harder to understand. I used to use it very often, but now I avoid it when possible, only using it when I absolutely want to limit the number of panels.

Note in this case I think gradient fits OK because there is only one value displayed as the line, so you can’t get confused if you need to look at total value or “filled volume”.

Separate Data of Different Scales on Different Graphs

I used to plot Innodb Rows Read and Written at the same graph. It is quite common to have reads to be 100x higher in volume than writes, crowding them out and making even significant changes in writes very hard to see. Splitting them to different graphs solved this issue.

Consider Staircase Graphs

In the monitoring applications, we often display average rates computed over a period of time. If this is the case, we do not know how the rate was changing within that period and it would be misleading to show that. This especially makes sense if you’re displaying only a few data points.

Let’s look at this graph which is being viewed with one-hour resolution:

This visually shows what amount of rows read was falling from 16:00 to 18:00, and if we compare it to the staircase graph:

It simply shows us that the value at 18 am was higher than 17 am, but does not make any claim about the change.

This display, however, has another issue. Let’s look at the same data set with 5min resolution:

We can see the average value from 16:00 to 17:00 was lower than from 17:00 to 18:00, but this is however NOT what the lower resolution staircase graph shows – the value for 17 to 18 is actually lower!

The reason for that is if we compute on Prometheus side rate() for 1 hour at 17:00 it will be returned as a data point for 17:00 where this average rate is really for 16:00 to 17:00, while staircase graph will plot it from 17:00 to 18:00 until a new value is available. It is off by one hour.

To fix it, you need to shift the data appropriately. In Prometheus, which we use in PMM, I can use an offset operator to shift the data to be displayed correctly:

Provide Multiple Resolutions

I’m a big fan of being able to see the data on the same dashboard with different resolutions, which can be done through a special dashboard variable of type “Interval”. High-resolution data can provide a great level of detail but can be very volatile.

While lower resolution can hide this level of detail, it does show trends better.

Multiple Aggregates for the Same Metrics

To get even more insights, you can consider plotting the same metrics with different aggregates applied to it:

In this case, we are looking at the same variable – threads_running – but at its average value over a period of time versus max (peak) value. Both of them are meaningful in a different way.

You can also notice here that points are used for the Max value instead of a line. This is in general good practice for highly volatile data, as a plottings line for something which changes wildly is messy and does not provide much value.

Use Help and Panel Links

If you fill out a description for the panel, it will be visible if you place your mouse over the tiny “i” sign. This is very helpful to explain what the panel shows and how to use this data. You can use Markup for formatting. You can also provide one or more panel links, that you can use for additional help or drill down.

With newer Grafana versions, you can even define a more advanced drill-down, which can contain different URLs based on the series you are looking at, as well as other templating variables:

Summary

This list of considerations for designing Grafana Dashboards and best practices is by no means complete, but I hope you pick up an idea or two which will allow you to create better dashboards!

↧

UUIDs are Popular, but Bad for Performance — Let’s Discuss

November 22, 2019, 10:52 am

≫ Next: Comparing S3 Streaming Tools with Percona XtraBackup

≪ Previous: Tips for Designing Grafana Dashboards

If you do a quick web search about UUIDs and MySQL, you’ll get a fair number of results. Here are just a few examples:

So, does a well-covered topic like this one needs any more attention? Well, apparently – yes. Even though most posts are warning people against the use of UUIDs, they are still very popular. This popularity comes from the fact that these values can easily be generated by remote devices, with a very low probability of collision. With this post, my goal is to summarize what has already been written by others and, hopefully, bring in a few new ideas.

What are UUIDs?

UUID stands for Universally Unique IDentifier and is defined in the RFC 4122. It is a 128 bits number, normally written in hexadecimal and split by dashes into five groups. A typical UUID value looks like:

yves@laptop:~$ uuidgen 
83fda883-86d9-4913-9729-91f20973fa52

There are officially 5 types of UUID values, version 1 to 5, but the most common are: time-based (version 1 or version 2) and purely random (version 3). The time-based UUIDs encode the number of 10ns since January 1st, 1970 in 7.5 bytes (60 bits), which is split in a “time-low”-“time-mid”-“time-hi” fashion. The missing 4 bits is the version number used as a prefix to the time-hi field. This yields the 64 bits of the first 3 groups. The last 2 groups are the clock sequence, a value incremented every time the clock is modified and a host unique identifier. Most of the time, the MAC address of the main network interface of the host is used as a unique identifier.

There are important points to consider when you use time-based UUID values:

It is possible to determine the approximated time when the value was generated from the first 3 fields
There are many repetitive fields between consecutive UUID values
The first field, “time-low”, rolls over every 429s
The MySQL UUID function produces version one values

Here’s an example using the “uuidgen” Unix tool to generate time-based values:

yves@laptop:~$ for i in $(seq 1 500); do echo "$(date +%s): $(uuidgen -t)"; sleep 1; done
1573656803: 572e4122-0625-11ea-9f44-8c16456798f1
1573656804: 57c8019a-0625-11ea-9f44-8c16456798f1
1573656805: 586202b8-0625-11ea-9f44-8c16456798f1
...
1573657085: ff86e090-0625-11ea-9f44-8c16456798f1
1573657086: 0020a216-0626-11ea-9f44-8c16456798f1
...
1573657232: 56b943b2-0626-11ea-9f44-8c16456798f1
1573657233: 57534782-0626-11ea-9f44-8c16456798f1
1573657234: 57ed593a-0626-11ea-9f44-8c16456798f1
...

The first field rolls over (at t=1573657086) and the second field is incremented. It takes about 429s to see similar values again for the first field. The third field changes only once per about a year. The last field is static on a given host, the MAC address is used on my laptop:

yves@laptop:~$ ifconfig | grep ether | grep 8c
    	ether 8c:16:45:67:98:f1  txqueuelen 1000  (Ethernet)

The other frequently seen UUID version is 4, the purely random one. By default, the Unix “uuidgen” tool produces UUID version 4 values:

yves@laptop:~$ for i in $(seq 1 3); do uuidgen; done
6102ef39-c3f4-4977-80d4-742d15eefe66
14d6e343-028d-48a3-9ec6-77f1b703dc8f
ac9c7139-34a1-48cf-86cf-a2c823689a91

The only “repeated” value is the version, “4”, at the beginning of the 3rd field. All the other 124 bits are random.

What is so Wrong with UUID Values?

In order to appreciate the impact of using UUID values as a primary key, it is important to review how InnoDB organizes the data. InnoDB stores the rows of a table in the b-tree of the primary key. In database terminology, we call this a clustered index. The clustered index orders the rows automatically by the primary key.

When you insert a new row with a random primary key value, InnoDB has to find the page where the row belongs, load it in the buffer pool if it is not already there, insert the row and then, eventually, flush the page back to disk. With purely random values and large tables, all b-tree leaf pages are susceptible to receive the new row, there are no hot pages. Rows inserted out of the primary key order cause page splits causing a low filling factor. For tables much larger than the buffer pool, an insert will very likely need to read a table page from disk. The page in the buffer pool where the new row has been inserted will then be dirty. The odds the page will receive a second row before it needs to be flushed to disk are very low. Most of the time, every insert will cause two IOPs – one read and one write. The first major impact is on the rate of IOPs and it is a major limiting factor for scalability.

The only way to get decent performance is thus to use storage with low latency and high endurance. That’s where you’ll the second major performance impact. With a clustered index, the secondary indexes use the primary key values as the pointers. While the leaves of the b-tree of the primary key store rows, the leaves of the b-tree of a secondary index store primary key values.

Let’s assume a table of 1B rows having UUID values as primary key and five secondary indexes. If you read the previous paragraph, you know the primary key values are stored six times for each row. That means a total of 6B char(36) values representing 216 GB. That is just the tip of the iceberg, as tables normally have foreign keys, explicit or not, pointing to other tables. When the schema is based on UUID values, all these columns and indexes supporting them are char(36). I recently analyzed a UUID based schema and found that about 70 percent of storage was for these values.

As if that’s not enough, there’s a third important impact of using UUID values. Integer values are compared up to 8 bytes at a time by the CPU but UUID values are compared char per char. Databases are rarely CPU bound, but nevertheless this adds to the latencies of the queries. If you are not convinced, look at this performance comparison between integers vs strings:

mysql> select benchmark(100000000,2=3);
+--------------------------+
| benchmark(100000000,2=3) |
+--------------------------+
|                        0 |
+--------------------------+
1 row in set (0.96 sec)

mysql> select benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='df878007-80da-11e9-93dd-00163e000003');
+----------------------------------------------------------------------------------------------------+
| benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='df878007-80da-11e9-93dd-00163e000003') |
+----------------------------------------------------------------------------------------------------+
|                                                                                                  0 |
+----------------------------------------------------------------------------------------------------+
1 row in set (27.67 sec)

Of course, the above example is a worst-case scenario but it at least gives the span of the issue. Comparing integers is about 28 times faster. Even if the difference appears rapidly in the char values, it is still about 2.5 times slower:

mysql> select benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='ef878007-80da-11e9-93dd-00163e000003');
+----------------------------------------------------------------------------------------------------+
| benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='ef878007-80da-11e9-93dd-00163e000003') |
+----------------------------------------------------------------------------------------------------+
|                                                                                                  0 |
+----------------------------------------------------------------------------------------------------+
1 row in set (2.45 sec)

Let’s explore a few solutions to address those issues.

Size of the Values

The default representation for UUID, hash, and token values is often the hexadecimal notation. With a cardinality, the number of possible values, of only 16 per byte, it is far from efficient. What about using another representation like base64 or even straight binary? How much do we save? How is the performance affected?

Let’s begin by the base64 notation. The cardinality of each byte is 64 so it takes 3 bytes in base64 to represent 2 bytes of actual value. A UUID value consists of 16 bytes of data, if we divide by 3, there is a remainder of 1. To handle that, the base64 encoding adds ‘=’ at the end:

mysql> select to_base64(unhex(replace(uuid(),'-','')));
+------------------------------------------+
| to_base64(unhex(replace(uuid(),'-',''))) |
+------------------------------------------+
| clJ4xvczEeml1FJUAJ7+Fg==                 |
+------------------------------------------+
1 row in set (0.00 sec)

If the length of the encoded entity is known, like for a UUID, we can remove the ‘==’, as it is just dead weight. A UUID encoded in base64 thus has a length of 22.

The next logical step is to directly store the value in binary format. This the most optimal format but displaying the values in the mysql client is less convenient.

So, how’s the size impacting performance? To illustrate the impact, I inserted random UUID values in a table with the following definition…

CREATE TABLE `data_uuid` (
  `id` char(36) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

… for the default hexadecimal representation. For base64, the ‘id’ column is defined as char(22) while binary(16) is used for the binary example. The database server has a buffer pool size at 128M and its IOPs are limited to 500. The insertions are done over a single thread.

Insertion rates for tables using different representation for UUID values

In all cases, the insertion rate is at first CPU bound but as soon the table is larger than the buffer pool, the insertion rapidly becomes IO bound. This is expected and shouldn’t surprise anyone. The use of a smaller representation for the UUID values just allows more rows to fit in the buffer pool but in the long run, it doesn’t really help the performance, as the random insertion order dominates. If you are using random UUID values as primary keys, your performance is limited by the amount of memory you can afford.

Option 1: Saving IOPs with Pseudo-Random Order

As we have seen, the most important issue is the random nature of the values. A new row may end up in any of the table leaf pages. So unless the whole table is loaded in the buffer pool, it means a read IOP and eventually a write IOP. My colleague David Ducos gave a nice solution to this problem but some customers do not want to allow for the possibility of extracting information from the UUID values, like, for example, the generation timestamp.

What if we somewhat just reduce then the randomness of the values in a way that a prefix of a few bytes is constant for a time interval? During the time interval, only a fraction of the whole table, corresponding to the cardinality of the prefix, would be required to be in the memory to save the read IOPs. This would also increase the likelihood a page receives a second write before being flushed to disk, thus reducing the write load. Let’s consider the following UUID generation function:

drop function if exists f_new_uuid; 
delimiter ;;
CREATE DEFINER=`root`@`%` FUNCTION `f_new_uuid`() RETURNS char(36)
    NOT DETERMINISTIC
BEGIN
    DECLARE cNewUUID char(36);
    DECLARE cMd5Val char(32);


    set cMd5Val = md5(concat(rand(),now(6)));
    set cNewUUID = concat(left(md5(concat(year(now()),week(now()))),4),left(cMd5Val,4),'-',
        mid(cMd5Val,5,4),'-4',mid(cMd5Val,9,3),'-',mid(cMd5Val,13,4),'-',mid(cMd5Val,17,12));

    RETURN cNewUUID;
END;;
delimiter ;

The first four characters of the UUID value comes from the MD5 hash of the concatenation of the current year and week number. This value is, of course, static over a week. The remaining of the UUID value comes from the MD5 of a random value and the current time at a precision of 1us. The third field is prefixed with a “4” to indicate it is a version 4 UUID type. There are 65536 possible prefixes so, during a week, only 1/65536 of the table rows are required in the memory to avoid a read IOP upon insertion. That’s much easier to manage, a 1TB table will need to have only about 16MB in the buffer pool to support the inserts.

Option 2: Mapping UUIDs to Integers

Even if you use pseudo-ordered UUID values stored using binary(16), it is still a very large data type which will inflate the size of the dataset. Remember the primary key values are used as pointers in the secondary indexes by InnoDB. What if we store all the UUID values of a schema in a mapping table? The mapping table will be defined as:

CREATE TABLE `uuid_to_id` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `uuid` char(36) NOT NULL,
  `uuid_hash` int(10) unsigned GENERATED ALWAYS AS (crc32(`uuid`)) STORED NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_hash` (`uuid_hash`)
) ENGINE=InnoDB AUTO_INCREMENT=2590857 DEFAULT CHARSET=latin1;

It is important to notice the uuid_to_id table does not enforce the uniqueness of uuid. The idx_hash index acts a bit like a bloom filter. We’ll know for sure a UUID value is not present in the table when there is no matching hash value but we’ll have to validate with the stored UUID value when there is a matching hash. To help us here, let’s create a SQL function:

DELIMITER ;;
CREATE DEFINER=`root`@`%` FUNCTION `f_uuid_to_id`(pUUID char(36)) RETURNS int(10) unsigned
    DETERMINISTIC
BEGIN
        DECLARE iID int unsigned;
        DECLARE iOUT int unsigned;

        select get_lock('uuid_lock',10) INTO iOUT;

        SELECT id INTO iID
        FROM uuid_to_id WHERE uuid_hash = crc32(pUUID) and uuid = pUUID;

        IF iID IS NOT NULL THEN
            select release_lock('uuid_lock') INTO iOUT;
            SIGNAL SQLSTATE '23000'
                SET MESSAGE_TEXT = 'Duplicate entry', MYSQL_ERRNO = 1062;
        ELSE
            insert into uuid_to_id (uuid) values (pUUID);
            select release_lock('uuid_lock') INTO iOUT;
            set iID = last_insert_id();
        END IF;

        RETURN iID;
END ;;
DELIMITER ;

The function checks if the UUID values passed exist in the uuid_to_id table, and if it does it returns the matching id value otherwise it inserts the UUID value and returns the last_insert_id. To protect against the concurrent submission of the same UUID values, I added a database lock. The database lock limits the scalability of the solution. If your application cannot submit twice the request over a very short time frame, the lock could be removed. I have also another version of the function with no lock calls and using a small dedup table where recent rows are kept for only a few seconds. See my github if you are interested.

Results for the Alternate Approaches

Now, let’s have a look at the insertion rates using these alternate approaches.

Insertion on tables using UUID values as primary keys, alternative solutions

The pseudo-order results are great. Here I modified the algorithm to keep the UUID prefix constant for one minute instead of one week in order to better fit the test environment. Even if the pseudo-order solution performs well, keep in mind it is still bloating the schema and overall the performance gains may not be that great.

The mapping to integer values, although the insert rates are smaller due to the additional DMLs required, decouples the schema from the UUID values. The tables now use integers as primary keys. This mapping removes nearly all the scalability concerns of using UUID values. Still, even on a small VM with limited CPU and IOPS, the UUID mapping technique yields nearly 4000 inserts/s. Put into context, this means 14M rows per hour, 345M rows per day and 126B rows per year. Such rates likely fit most requirements. The only growth limitation factor is the size of the hash index. When the hash index will be too large to fit in the buffer pool, performance will start to decrease.

Other Options than UUID Values?

Of course, there are other possibilities to generate unique IDs. The method used by the MySQL function UUID_SHORT() is interesting. A remote device like a smartphone could use the UTC time instead of the server uptime. Here’s a proposal:

(Seconds since January 1st 1970) << 32
+ (lower 2 bytes of the wifi MAC address) << 16
+ 16_bits_unsigned_int++;

The 16 bits counter should be initialized at a random value and allowed to roll over. The odds of two devices producing the same ID are very small. It has to happen at approximately the same time, both devices must have the same lower bytes for the MAC and their 16 bits counter at the same increment.

Notes

All the data related to this post can be found in my github.

↧

Comparing S3 Streaming Tools with Percona XtraBackup

November 26, 2019, 8:07 am

≫ Next: Running PMM1 and PMM2 Clients on the Same Host

≪ Previous: UUIDs are Popular, but Bad for Performance — Let’s Discuss

Making backups over the network can be done in two ways: either save on disk and transfer or just transfer without saving. Both ways have their strong and weak points. The second way, particularly, is highly dependent on the upload speed, which would either reduce or increase the backup time. Other factors that influence it are chunk size and the number of upload threads.

Percona XtraBackup 2.4.14 has gained S3 streaming, which is the capability to upload backups directly to s3-compatible storage without saving locally first. This feature was developed because we wanted to improve the upload speeds of backups in Percona Operator for XtraDB Cluster.

There are many implementations of S3 Compatible Storage: AWS S3, Google Cloud Storage, Digital Ocean Spaces, Alibaba Cloud OSS, MinIO, and Wasabi.

We’ve measured the speed of AWS CLI, gsutil, MinIO client, rclone, gof3r and the xbcloud tool (part of Percona XtraBackup) on AWS (in single and multi-region setups) and on Google Cloud. XtraBackup was compared in two variants: a default configuration and one with tuned chunk size and amount of uploading threads.

Here are the results.

AWS (Same Region)

The backup data was streamed from the AWS EC2 instance to the AWS S3, both in the us-east-1 region.

tool	settings	CPU	max mem	speed	speed comparison
AWS CLI	default settings	66%	149Mb	130MiB/s	baseline
AWS CLI	10Mb block, 16 threads	68%	169Mb	141MiB/s	+8%
MinIO client	not changeable	10%	679Mb	59MiB/s	-55%
rclone rcat	not changeable	102%	7138Mb	139MiB/s	+7%
gof3r	default settings	69%	252Mb	97MiB/s	-25%
gof3r	10Mb block, 16 threads	77%	520Mb	108MiB/s	-17%
xbcloud	default settings	10%	96Mb	25MiB/s	-81%
xbcloud	10Mb block, 16 threads	60%	185Mb	134MiB/s	+3%

Tip: If you run MySQL on an EC2 instance to make backups inside one region, do snapshots instead.

AWS (From US to EU)

The backup data was streamed from AWS EC2 in us-east-1 to AWS S3 in eu-central-1.

tool	settings	CPU	max mem	speed	speed comparison
AWS CLI	default settings	31%	149Mb	61MiB/s	baseline
AWS CLI	10Mb block, 16 threads	33%	169Mb	66MiB/s	+8%
MinIO client	not changeable	3%	679Mb	20MiB/s	-67%
rclone rcat	not changeable	55%	9307Mb	77MiB/s	+26%
gof3r	default settings	69%	252Mb	97MiB/s	+59%
gof3r	10Mb block, 16 threads	77%	520Mb	108MiB/s	+77%
xbcloud	default settings	4%	96Mb	10MiB/s	-84%
xbcloud	10Mb block, 16 threads	59%	417Mb	123MiB/s	+101%

Tip: Think about disaster recovery, and what will you do when the whole region is not available. It makes no sense to back up to the same region; always transfer backups to another region.

Google Cloud (From US to EU)

The backup data were streamed from Compute Engine instance in us-east1 to Cloud Storage europe-west3. Interestingly, Google Cloud Storage supports both native protocol and S3(interoperability) API. So, Percona XtraBackup can transfer data to Google Cloud Storage directly via S3(interoperability) API.

tool	settings	CPU	max mem	speed	speed comparison
gsutil	not changeable, native protocol	8%	246Mb	23MiB/s	etalon
rclone rcat	not changeable, native protocol	6%	61Mb	16MiB/s	-30%
xbcloud	default settings, s3 protocol	3%	97Mb	9MiB/s	-61%
xbcloud	10Mb block, 16 threads, s3 protocol	50%	417Mb	133MiB/s	+478%

Tip: A cloud provider can block your account due to many reasons, such as human or robot mistakes, inappropriate content abuse after hacking, credit card expire, sanctions, etc. Think about disaster recovery and what will you do when a cloud provider blocks your account, it may make sense to back up to another cloud provider or on-premise.

Conclusion

xbcloud tool (part of Percona XtraBackup) is 2-5 times faster with tuned settings on long-distance with native cloud vendor tools, and 14% faster and requires 20% less memory than analogs with the same settings. Also, xbcloud is the most reliable tool for transferring backups to S3-compatible storage because of two reasons:

It calculates md5 sums during the uploading and puts them into a .md5/filename.md5 file and verifies sums on the download (gof3r does the same).
xbcloud sends data in 10mb chunks and resends them if any network failure happens.

PS: Please find instructions on GitHub if you would like to reproduce this article’s results.

↧

Running PMM1 and PMM2 Clients on the Same Host

November 27, 2019, 6:09 am

≫ Next: Percona XtraBackup 8.0.8, Percona Server for MongoDB 3.6.15-3.5: Release Roundup 12/2/2019

≪ Previous: Comparing S3 Streaming Tools with Percona XtraBackup

Running PMM1 and PMM2 Clients Want to try out Percona Monitoring and Management 2 (PMM 2) but you’re not ready to turn off your PMM 1 environment? This blog is for you! Keep in mind that the methods described are not intended to be a long-term migration strategy, but rather, simply a way to deploy a few clients in order to sample PMM 2 before you commit to the upgrade. 🙂

Here are step-by-step instructions for deploying PMM 1 & 2 client functionality i.e. pmm-client and pmm2-client, on the same host.

Deploy PMM 1 on Server1 (you’ve probably already done this)
Install and setup pmm-client for connectivity to Server1
Deploy PMM 2 on Server2
Install and setup pmm2-client for connectivity to Server2
Remove pmm-client and switched completely to pmm2-client

The first few steps are already described in our PMM1 documentation so we are simply providing links to those documents. Here we’ll focus on steps 4 and 5.

Install and Setup pmm2-client Connectivity to Server2

It’s not possible to install both clients from a repository at the same time. So you’ll need to download a tarball of pmm2-client. Here’s a link to the latest version directly from our site.

Download pmm2-client Tarball

* Note that depending on when you’re seeing this, the commands below may not be for the latest version, so the commands may need to be updated for the version you downloaded.

$ wget https://www.percona.com/downloads/pmm2/2.1.0/binary/tarball/pmm2-client-2.1.0.tar.gz

Extract Files From pmm2-client Tarball

$ tar -zxvf pmm2-client-2.1.0.tar.gz 
$ cd pmm2-client-2.1.0

Register and Generate Configuration File

Now it’s time to set up a PMM 2 client. In our example, the PMM2 server IP is 172.17.0.2 and the monitored host IP is 172.17.0.1.

$ ./bin/pmm-agent setup --config-file=config/pmm-agent.yaml \
--paths-node_exporter="$PWD/pmm2-client-2.1.0/bin/node_exporter" \
--paths-mysqld_exporter="$PWD/pmm2-client-2.1.0/bin/mysqld_exporter" \
--paths-mongodb_exporter="$PWD/pmm2-client-2.1.0/bin/mongodb_exporter" \
--paths-postgres_exporter="$PWD/pmm2-client-2.1.0/bin/postgres_exporter" \
--paths-proxysql_exporter="$PWD/pmm2-client-2.1.0/bin/proxysql_exporter" \
--server-insecure-tls --server-address=172.17.0.2:443 \
--server-username=admin  --server-password="admin" 172.17.0.1 generic node8.ca

Start pmm-agent

Let’s run the pmm-agent using a screen. There’s no service manager integration when deploying alongside pmm-client, so if your server restarts, pmm-agent won’t automatically resume.

# screen -S pmm-agent

$ ./bin/pmm-agent --config-file="$PWD/config/pmm-agent.yaml"

Check the Current State of the Agent

$ ./bin/pmm-admin list
Service type  Service name         Address and port  Service ID

Agent type                  Status     Agent ID                                        Service ID
pmm-agent                   connected  /agent_id/805db700-3607-40a9-a1fa-be61c76fe755  
node_exporter               running    /agent_id/805eb8f6-3514-4c9b-a05e-c5705755a4be

Add MySQL Service

Detach the screen, then add the mysql service:

$ ./bin/pmm-admin add mysql --use-perfschema --username=root mysqltest
MySQL Service added.
Service ID  : /service_id/28c4a4cd-7f4a-4abd-a999-86528e38992b
Service name: mysqltest

Here is the state of pmm-agent:

$ ./bin/pmm-admin list
Service type  Service name         Address and port  Service ID
MySQL         mysqltest            127.0.0.1:3306    /service_id/28c4a4cd-7f4a-4abd-a999-86528e38992b

Agent type                  Status     Agent ID                                        Service ID
pmm-agent                   connected  /agent_id/805db700-3607-40a9-a1fa-be61c76fe755   
node_exporter               running    /agent_id/805eb8f6-3514-4c9b-a05e-c5705755a4be   
mysqld_exporter             running    /agent_id/efb01d86-58a3-401e-ae65-fa8417f9feb2  /service_id/28c4a4cd-7f4a-4abd-a999-86528e38992b
qan-mysql-perfschema-agent  running    /agent_id/26836ca9-0fc7-4991-af23-730e6d282d8d  /service_id/28c4a4cd-7f4a-4abd-a999-86528e38992b

Confirm you can see activity in each of the two PMM Servers:

PMM 1	PMM 2

Remove pmm-client and Switch Completely to pmm2-client

Once you’ve decided to move over completely to PMM2, it’s better to make a switch from the tarball version to installation from the repository. It will allow you to perform client updates much easier as well as register the new agent as a service for automatically starting with the server. Also, we will show you how to make a switch without re-adding monitored instances.

Configure Percona Repositories

$ sudo yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm 
$ sudo percona-release disable all 
$ sudo percona-release enable original release 
$ yum list | grep pmm 
pmm-client.x86_64                    1.17.2-1.el6                  percona-release-x86_64
pmm2-client.x86_64                   2.1.0-1.el6                   percona-release-x86_64

Here is a link to the apt variant.

Remove pmm-client

yum remove pmm-client

Install pmm2-client

$ yum install pmm2-client
Loaded plugins: priorities, update-motd, upgrade-helper
4 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package pmm2-client.x86_64 0:2.1.0-5.el6 will be installed
...
Installed:
  pmm2-client.x86_64 0:2.1.0-5.el6                                                                                                                                                           

Complete!

Configure pmm2-client

Let’s copy the currently used pmm2-client configuration file in order to omit re-adding monitored instances.

$ cp pmm2-client-2.1.0/config/pmm-agent.yaml /tmp

It’s required to set the new location of exporters (/usr/local/percona/pmm2/exporters/) in the file.

$ sed -i 's|node_exporter:.*|node_exporter: /usr/local/percona/pmm2/exporters/node_exporter|g' /tmp/pmm-agent.yaml
$ sed -i 's|mysqld_exporter:.*|mysqld_exporter: /usr/local/percona/pmm2/exporters/mysqld_exporter|g' /tmp/pmm-agent.yaml
$ sed -i 's|mongodb_exporter:.*|mongodb_exporter: /usr/local/percona/pmm2/exporters/mongodb_exporter|g' /tmp/pmm-agent.yaml 
$ sed -i 's|postgres_exporter:.*|postgres_exporter: /usr/local/percona/pmm2/exporters/postgres_exporter|g' /tmp/pmm-agent.yaml
$ sed -i 's|proxysql_exporter:.*|proxysql_exporter: /usr/local/percona/pmm2/exporters/proxysql_exporter|g' /tmp/pmm-agent.yaml

The default configuration file has to be replaced by our file and the service pmm-agent has to be restarted.

$ cp /tmp/pmm-agent.yaml /usr/local/percona/pmm2/config/
$ systemctl restart pmm-agent

Check Monitored Services

So now we can verify the current state of monitored instances.

$ pmm-admin list

Also, it can be checked on PMM server-side.

↧

Percona XtraBackup 8.0.8, Percona Server for MongoDB 3.6.15-3.5: Release Roundup 12/2/2019

December 2, 2019, 8:41 am

≫ Next: Webinar 12/5: Introduction to MySQL Query Tuning for DevOps

≪ Previous: Running PMM1 and PMM2 Clients on the Same Host

It’s release roundup time here at Percona!

Today’s release post encompasses Percona releases from November 19, 2019 – December 2, 2019. Every few weeks, each roundup showcases the latest in software updates, tools, and features to help you manage and deploy our software, with highlights and critical information, as well as links to the full release notes and direct links to the software or service itself.

Percona XtraBackup 8.0.8

On November 20, we released Percona XtraBackup 8.0.8. This release focuses on the support log archiving feature in PXB 8.0, the creation of renewable checkpoints for the MyRocks storage engine, and two options (–-backup-lock-timeout and -–backup-lock-retry-count) added to enable the configuring of the timeout for acquiring metadata locks in FLUSH TABLES WITH READ LOCK, LOCK TABLE FOR BACKUP, and LOCK BINLOG FOR BACKUP statements. Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime.

Download Percona XtraBackup 8.0.8

Percona Server for MongoDB 3.6.15-3.5

On November 26th, Percona Server for MongoDB 3.6.15-3.5 was released. It is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.6 Community Edition. It supports MongoDB 3.6 protocols and drivers. Percona Server for MongoDB 3.6.15-3.5 adds the ability for remote streaming hot backups to Amazon S3 or compatible storage such as MinIO, and is now included in all our active MongoDB releases (3.6, 4.0, and 4.2).

Download Percona Server for MongoDB 3.6.15-3.5

↧

Webinar 12/5: Introduction to MySQL Query Tuning for DevOps

December 3, 2019, 6:14 am

≫ Next: Percona Live 2020: Call For Papers

≪ Previous: Percona XtraBackup 8.0.8, Percona Server for MongoDB 3.6.15-3.5: Release Roundup 12/2/2019

MySQL does its best to return requested bytes as fast as possible. However, it needs human help to identify what is important and should be accessed in the first place. Queries, written smartly, can significantly outperform automatically generated ones. Indexes and Optimizer statistics, not limited to the Histograms only, help increase the speed of the query.

Join Percona’s Principal Support Engineer for MySQL Sveta Smirnova on Thurs, Dec 5th from 10 to 11 am PST to learn how MySQL query performance can be improved through the utilization of Developer and DevOps tools. In addition, you’ll learn troubleshooting techniques to help identify and solve query performance issues.

If you can’t attend, sign up anyways we’ll send you the slides and recording afterward.

↧

Percona Live 2020: Call For Papers

December 3, 2019, 7:29 am

≫ Next: PostgreSQL 12 Improvement: Benign Log Entries “Incomplete Startup Packet”

≪ Previous: Webinar 12/5: Introduction to MySQL Query Tuning for DevOps

The Call For Papers (CFP) for Percona Live 2020 is now open!

Percona Live will be held in Austin, Texas from Monday, May 18 through Wednesday, May 20, 2020 at a new venue, the AT&T Hotel and Conference Center. The CFP is open for submissions from November 27, 2019, through January 13, 2020. We invite abstracts covering any and all aspects of open source databases, including on-premise, in the cloud, and across the multi-verse!

Hot Open Source Topics for 2020

All Open Source database themes are welcome, but these are our hot topics for 2020:

Success in the multi-verse: How to optimize performance, architecture, high-availability, replication, and more in a multi-cloud, multi-database environment.
Support of cloud-native applications in database environments: How you burst scale and performance when you need it.
Managing systems at scale: How to manage 1000’s of databases effectively.
Finding and solving problems quickly: How you keep systems up and running in the heat of an outage or a slowdown.
Data security: How you prevent your database from leaking data.
Development: Best practices for enabling developers to self-support and make database choices.

All abstracts will get a full, fair, and competitive assessment by our Conference Program Committee of open source database experts. We’re currently finalizing our committee membership, which will be announced soon.

Sponsorship Opportunities

The conference will also include presentations by sponsoring companies that operate on the leading edge of open source technology. Many of our sponsors are pivotal players in the industry and make important contributions to open source projects. To learn more about sponsorship opportunities, contact Bronwyn Campbell.

Key Points

Proposals can be for half-day or full-day tutorials, 50-minute conference sessions, 25-minute conference sessions, or 10-minute lightning talks.
A talk can be shared by up to four speakers.
All speakers, except lightning talks, receive a full, free pass to Percona Live.
The closing date for proposals is 11:59 p.m. AoE (GMT -12) on Monday, January 13, 2020.
We may select some proposals early, before the CFP closes, to announce an agenda sneak peek. So the earlier you submit, the better your chance of success.

If you have any questions about the CFP or the conference, don’t hesitate to get in touch! You can contact me via email at community-team@percona.com. Meanwhile, if you are ready to register then sign up now – you can save your submission in progress, so there’s no need to do it all in one session.

Good luck!

↧

PostgreSQL 12 Improvement: Benign Log Entries “Incomplete Startup Packet”

December 3, 2019, 9:38 am

≫ Next: Ensure Your Database Infrastructure Supports New Year Revenue Goals

≪ Previous: Percona Live 2020: Call For Papers

There is a less-talked-about improvement in PostgreSQL 12 which can greatly reduce the benign log entries. This patch is probably one of the smallest in PostgreSQL 12.

The commit message says:

Don't log incomplete startup packet if it's empty

This will stop logging cases where, for example, a monitor opens a
connection and immediately closes it. If the packet contains any data an
incomplete packet will still be logged.

Author: Tom Lane

This patch is going to improve the experience of many enterprise users by reducing unwanted log entries. It is very common to see the PostgreSQL log file running into several GBs due mainly to such unwanted benign entries.

You can read the full discussion thread at postgresql.org.

Background

In PostgreSQL, for each client connection request to Postmaster (listens on port 5432 by default), a backend process will be created. It then processes the startup packet from the client. Refer to

src/backend/postmaster/postmaster.c

for the source code. Each client connection request is expected to send a startup message to the PostgreSQL server, and this information in the startup packet is used for setting up the backend process. But there are many more things happening when we deploy PostgreSQL in a datacenter. There could be different monitoring solutions, security scanners, port scanners, HA Solutions, etc hitting on PostgreSQL Port 5432. PostgreSQL starts processing these incoming connections for establishing a client-server communication channel. But many of these tools may have a different intention and won’t be participating in a good client-server protocol. Historically, PostgreSQL generates a log entry for each of these suspected/bad hits. This can result in log files growing to a huge size and can cause unwanted log-related IO.

Even though it looks silly, this was so annoying that many tool vendors started documenting it for their customers, advising them to just ignore such messages, as we can see here. HA Solutions like Stolon reported a similar problem. Monitoring plugins for Nagios, Cacti, and Zabbix also caused the same, and it appeared in the PostgreSQL mailing list multiple times over several years. For example:

Reproducing the Case

Any port scanner or TCP Port checker can cause the log entries. The ncat/nc utility has Zero-I/O mode and reports connection status only (-z option).

for i in {1..100}; do
     nc -zv localhost 5432 ;
done

This produces LOG entries like the following for PostgreSQL version up to 11:

2019-11-28 13:24:26.501 UTC [15168] LOG: incomplete startup packet
2019-11-28 13:24:26.517 UTC [15170] LOG: incomplete startup packet
2019-11-28 13:24:26.532 UTC [15172] LOG: incomplete startup packet
2019-11-28 13:24:26.548 UTC [15174] LOG: incomplete startup packet
2019-11-28 13:24:26.564 UTC [15176] LOG: incomplete startup packet
2019-11-28 13:24:26.580 UTC [15178] LOG: incomplete startup packet
2019-11-28 13:24:26.595 UTC [15180] LOG: incomplete startup packet
2019-11-28 13:24:26.611 UTC [15182] LOG: incomplete startup packet
2019-11-28 13:24:26.627 UTC [15184] LOG: incomplete startup packet
2019-11-28 13:24:26.645 UTC [15186] LOG: incomplete startup packet
2019-11-28 13:24:26.666 UTC [15188] LOG: incomplete startup packet
2019-11-28 13:24:26.687 UTC [15190] LOG: incomplete startup packet
2019-11-28 13:24:26.710 UTC [15192] LOG: incomplete startup packet
2019-11-28 13:24:26.729 UTC [15194] LOG: incomplete startup packet
2019-11-28 13:24:26.748 UTC [15196] LOG: incomplete startup packet
...

But in PostgreSQL 12, it detects that it is zero size packets and just ignores it. There won’t be any entry in the log file.

Additional Note

Unfortunately, some of the tools are not gentle enough to write a zero size packet. As per consciences in the community, this needs to be logged.

2019-11-28 14:27:49.728 UTC [17982] LOG: invalid length of startup packet
2019-11-28 14:28:14.907 UTC [17983] LOG: invalid length of startup packet
2019-11-28 14:28:18.236 UTC [17984] LOG: invalid length of startup packet

Tom Lane explained in the mailing list:

" The agreed-to behavior change was to not log anything if the connection is closed without any data having been sent. If the
client *does* send something, and it doesn't look like a valid connection request, I think we absolutely should log that."

So we should expect to see such messages in PostgreSQL 12 also. We can simulate the problem by using telnet instead of nc command to check the open port. Some other tools abruptly end the connections which cause errors in libpq (library which implements PostgreSQL network protocol).

2019-11-28 14:11:45.273 UTC [17951] LOG: could not receive data from client: Connection reset by peer
2019-11-28 14:11:47.328 UTC [17953] LOG: could not receive data from client: Connection reset by peer
2019-11-28 14:11:48.425 UTC [17955] LOG: could not receive data from client: Connection reset by peer
2019-11-28 14:27:11.870 UTC [17978] LOG: could not receive data from client: Connection reset by peer

Such entries are also not going to go away. This happens when the server process tries to read packets (Refer: pq_recvbuf function in src/backend/libpq/pqcomm.c) sent from its client-side and then realizes that client-side is already lost. Which means that the client ended communication without a good handshake.

However, there will be a level of savings. Some tools like nmap used to produce both libpq errors and zero size packet error in PostgreSQL 11 as below:

2019-11-28 13:53:31.721 UTC [15446] LOG: could not receive data from client: Connection reset by peer
2019-11-28 13:53:31.721 UTC [15446] LOG: incomplete startup packet
2019-11-28 13:54:04.014 UTC [15450] LOG: could not receive data from client: Connection reset by peer
2019-11-28 13:54:04.014 UTC [15450] LOG: incomplete startup packet
2019-11-28 14:01:55.514 UTC [15479] LOG: could not receive data from client: Connection reset by peer
2019-11-28 14:01:55.514 UTC [15479] LOG: incomplete startup packet

In PostgreSQL12 this pair of errors will reduce to a single one like:

could not receive data from client: Connection reset by peer

Final Word

Many tools used across data centers are undergoing improvements for better compatibility with PostgreSQL. As I mentioned in my previous blog post, Configure HAProxy with PostgreSQL Using Built-in pgsql-check, recent improvements to HAProxy like this commit improves the disconnection of pgsql-check. So messages like LOG: could not receive data from client: Connection reset by peer may not appear in the logs anymore.

When someone wants to transmit a very large log file with a lot of these benign entries for any purpose, including external support, it may be worth removing these entries from the log file before transmitting/sharing them. For example, a simple sed command as follows could remove all startup packet related log entries:

sed -i '/startup packet/d' postgresql-Thu.log

A less bloated PostgreSQL log file could be an added benefit when you upgrade to PostgreSQL 12, which is one more reason for doing so.

↧

Ensure Your Database Infrastructure Supports New Year Revenue Goals

December 4, 2019, 6:38 am

≫ Next: MySQL Encryption: Talking About Keyrings

≪ Previous: PostgreSQL 12 Improvement: Benign Log Entries “Incomplete Startup Packet”

2020 Revenue Goals Technology is the great enabler for all modern business growth, and databases of some sort or another underpin all modern business technology. Database technology is the infrastructure upon which modern applications and businesses are built. This is no different than what has occurred in the past with other technologies. For example, new methods of transportation such as air travel and long-distance shipping created the basis for the import/export industry we all depend on now. And manufacturing as we know it was enabled by electricity, assembly lines, and robotic machines that can do repetitive jobs 24 hours a day. Technology always marches on, and with it can come drastic changes.

Revenue and profitability is the (most used) measure of success in modern business. Wall Street and investors scrutinize revenue growth, and along the way, they look for predictors for future success. They look for companies that are innovative, disruptive, and have the right infrastructure in place (team, momentum, sales process, R&D, etc. ), and they bet big on companies whose trajectory is upwards.

Most people don’t often associate database infrastructure with revenue growth and profitability. However, just like having electricity at a manufacturing plant, it is absolutely critical to have this foundational technology set up correctly to enable the growth of the rest of the company.

There are three checkboxes in database infrastructure that are the litmus test for ensuring the rest of the organization can grow at scale and as fast as needed:

Any database infrastructure must be designed to be secure & available by default. Data is a company’s most critical asset and it owes it to customers, shareholders, and employees to keep it safe.
Be set up to meet the expectations of your users (internal or external), which is easier said than done as expectations are skyrocketing. (Outages, uptime, usability, etc.)
Focus on building the right infrastructure that supports your unique needs/fingerprint, making a foundation that not only increases efficiency but also lowers cost.

These checkboxes may sound simple, and on paper, they are. But realistically, most companies often struggle in some form or another with all three of these. This can limit their growth and success.

Data is the Lifeblood of Modern Business

There is no doubt that data is the lifeblood, or currency, of modern businesses. You cannot find a modern business that is not using data to try and accelerate revenue, reduce costs, and make business decisions. Storing vast amounts of data to analyze and utilize is the new normal, but this new normal comes with expectations and requirements.

Don’t Let It Leak: Your Reputation and Business is at stake

Keeping control of your data is paramount to the growth and future viability of any business. Losing control of your data is incredibly costly, as not only can you be fined (GDPR, HIPAA, etc.) or face jail time in some cases, but you also risk revenue and reputation loss as you lose the trust and business of your customers. Once you lose the trust of your customers it is very hard and costly to regain that trust. Think about it – what it would take to win your business back if someone leaked personal details about you?

GDPR Impact

In just the first 8 months after GDPR came into effect there were more than forty thousand reported data breaches and 91 companies who were fined. The biggest fine came in July, when British Airways was fined £183.39m ($236 million USD) for infringements of the data protection law.

Here are just a few of the headlines reporting database-related leaks in 2019:

Data Breach At DoorDash Compromised Privacy Of 4.9 Million People

Open database leaked 179GB in customer, US government, and military records

7.5 Million Records of Adobe Creative Cloud User Data Exposed

Honda Motors Company databases leaked 40GB of employee data

885 million sensitive financial records left exposed by First American

Over 540 million Facebook records found on exposed AWS servers

Massive leak exposed 809 million email addresses and other records

Cloud database removed after exposing details on 80 million US households

Unsecured Databases Leak 60 Million Records of Scraped LinkedIn Data

Chinese companies have leaked over 590 million resumes via open databases

Unsecured MongoDB database exposes real-time locations of families

Database leaks 250K legal documents, some marked ‘not designated for publication’

Docker Hub Database Hack Exposes Sensitive Data of 190K Users

You don’t have to look far to see the growing number of database breaches and leaks taking their toll. These have taken place in every conceivable industry and for companies both small and large. The sad state is that most of these could easily be avoided. In fact, most are caused by human error or overlooked steps in security practices.

Meeting User Expectations

manage expectations

User expectations are higher than ever. A recent Google study found that if your application does not respond in three seconds or less, more than half of mobile website users will leave your site. And with the majority of web traffic coming from mobile devices, this could really affect your bottom line.

You cannot survive in business if your users are not taken care of.

Whether it’s your customers or your internal team, users unable to get what they need when they need it leads to frustration. So how can you set up your database infrastructure to ensure you exceed user expectations?

Build flexible systems that allow you to add database nodes, slaves, or other components to scale up and down to meet the demands of the application
Upfront design is super important, as sometimes bad design locks you into poor performance down the road
Backup, backup, backup (Percona XtraBackup performs a hot backup of your MySQL data while systems are running)
Test and review performance every few months
Use our Cost of Database Downtime Calculator to see just how much downtime could cost you

Build the Right Infrastructure

It has never been easier than it is today to find, add, and use new technology to help move your business forward. Often you will see individuals try new technology that helps them quickly analyze and understand critical trends and patterns in their data. The speed in which they can do this can be a game-changer, but as we learned in the movie Spiderman: “with great power, comes great responsibility.” Used incorrectly, technology can cause massive disruptions.

In the past when you wanted to use a new tool or add new software into your environment, you had to go through a process of involving IT to provision new hardware, get the servers set up and configured, and gain access. This process acted as a gatekeeper of sorts, enabling the technology experts to ask critical questions and offer better, or more secure ways to handle the same thing. With the rise of XaaS and infrastructure, it enables many users and developers to circumvent much of “IT” to do small things without the general centralized oversight that existed in the past.

the right infrastructure

This is especially true in the database space. Database vendors have spent a great deal of time making it easy for anyone to get started. In fact, the popularity of many popular database technologies (MongoDB, Elastic, and others) is fueled by how easy they have made it to get started. Taking that a step further, adding a database into any cloud provider is dead simple. Now not only can developers do it without assistance from a DBA or a database expert, but any user in your company can also do it with a click. Although vendors advertise security as a “Shared Responsibility,” the reality is that your application, your end-user access, and your usage is ultimately your responsibility. This is where many database leaks happen; someone moves data over to an external service, they don’t know how to properly secure the data, and it is left exposed.

If you build it, they will come

In order to build the right database infrastructure from the start, you will need to keep in mind…

There is no single database that does it all, and there are many different vendor, cloud, and XaaS options. Make sure you do your research to find the ones that best meet your needs.
Keeping your database portable helps you avoid vendor lock-in. Before signing any agreement(s), know that you can negotiate entry/exit plans should you wish to move vendors, And be sure to keep an eye on auto-renewals!
Being cloud-native is key as methodologies, enhancements, and tools for applications have outpaced databases.
Build resilient and fault-tolerant applications and systems from the beginning, and plan for them to fail. If you keep this in mind, you are better prepared for the day it actually happens.
A large portion of the community employs a multi-vendor database strategy, and you would be wise to do the same. Every vendor may not provide everything you need for your applications; so think about aligning them with each vendor’s strength.
There is no AI, automation, or tool that can completely handle all database-related needs.

Conclusion

It comes down to this: if you want to grow revenue, you’re going to need to make sure your database infrastructure is set up to enable you to do so. If your applications aren’t up and running, if you are losing data, if your customers don’t trust you, and if your customers (or employees) can’t get what they need when they need it, you’re going to have a hard time building revenue.

Technology was, is, and will continue to be a driving force for modern business growth, and the management of data and databases is nothing to take lightly. Percona is a leading provider of unbiased open source database solutions that allow organizations to easily, securely and affordably maintain business agility, minimize risks, and stay competitive. Our experts can maximize your application performance with our open source database support, managed services or consulting for MySQL, MariaDB, MongoDB, and PostgreSQL in on-premises and cloud environments.

So if you’re in the planning stages of revenue planning for 2020, or finalizing your technology budgets, we can help. We have extensive experience advising companies on the best way to configure, manage and run databases to ensure that not only are they secure but they also continue to drive revenue.

To learn more, please contact us at 1-888-316-9775 or 0-800-051-8984 in Europe or have us contact you.

↧

MySQL Encryption: Talking About Keyrings

December 9, 2019, 7:49 am

≫ Next: Open Source: What You Do Today Impacts Billions of People

≪ Previous: Ensure Your Database Infrastructure Supports New Year Revenue Goals

MySQL Keyrings It has been possible to enable Transparent Data Encryption (TDE) in Percona Server for MySQL/MySQL for a while now, but have you ever wondered how it works under the hood and what kind of implications TDE can have on your server instance? In this blog posts series, we are going to have a look at how TDE works internally. First, we talk about keyrings, as they are required for any encryption to work. Then we explore in detail how encryption in Percona Server for MySQL/MySQL works and what the extra encryption features are that Percona Server for MySQL provides.

MySQL Keyrings

Keyrings are plugins that allow a server to fetch/create/delete keys in a local file (keyring_file) or on a remote server (for example, HashiCorp Vault). All keys are cached locally inside the keyring’s cache to speed up fetching keys. They can be separated into two categories of plugins that use the following:

Local resource as a backend for storing keys, like local file (we call this resource file-based keyring)
Remote resource as a backend for storing keys, like Vault server (we call this resource server-based keyring)

The separation is important because depending on the backend, keyrings behave a bit differently, not only when storing/fetching keys but also on startup.

In the case of a file-based keyring, the keyring on startup loads the entire content of the keyring (i.e., key id, key user, key type, together with keys themselves) into the cache.

In the case of server-based keyring (for instance, Vault server), the server loads only a list of the key ids and the key user on the startup so the startup is not slowed by retrieving all of the keys from the server. It is worth mentioning what information is stored in the keyring backend. The keys are lazy-loaded, which means when the first time a server requests a key, the keyring_vault asks the Vault server to send the key. The keyring caches the key in memory to ensure if, in the future, the server can use memory access instead of a TLS connection to the Vault server to retrieve the key.

The record in keyring consist of the following:

key id – An ID of the key, for instance: INNODBKey-764d382a-7324-11e9-ad8f-9cb6d0d5dc99-1
key type – The type of key, based on the encryption algorithm used, possible values are: “AES”, “RSA” or “DSA”
key length – Length is measured in bytes, AES: 16, 24 or 32, RSA 128, 256, 512, and DSA 128, 256 or 384.
user – Owner of the key. If this key is a system key, such as the Master Key, this field is empty. When the key is created with keyring_udf, this field is the owner of the key.
key itself

Each key is uniquely identified by pair: key_id, user.

There are also differences when it comes to storing and deleting keys.

The file-based keyring operation should be faster, and the operation is. You may assume the key storage is just a single write of a key to a file, but more tasks are involved. Before any file-based keyring modification, the keyring creates a backup file with the entire content of the keyring and places this backup file next to the keyring file. Let’s say your keyring file is called my_biggest_secrets; the backup is named my_biggest_secrets.backup. Next, the keyring modifies the cache to add or remove a key, and if this task is successful, it dumps (i.e., rewrites the entire content of a keyring file) from the cache into your keyring file. On rare occasions, such as a server crash, you can observe this backup file. The backup file is deleted by keyring next time the keyring is loaded (generally after the server restart).

When storing or deleting a key, the server-based keyring must connect to the server and communicate a “send the key”/”request key deletion” from the server.

Let’s get back to the speed of the server startup. Apart from the keyring itself impacting the startup time, there is also a matter of how many keys must be retrieved from the backend server on startup. Of course, this is especially important for server-based keyrings. On server startup, the server checks what key is needed to decrypt each encrypted table/tablespaces and fetches this key from the keyring. On a “clean” server with Master Key encryption, there should be one Master Key that must be fetched from the keyring. However, there can be more keys required, for instance, when a slave is re-created from master backup, etc. In those cases, it is good to consider the Master Key rotation. I will talk more about that in future blog posts, but I just wanted to outline here that a server that is using multiple Master Keys might startup a bit longer, primarily when server-based keyring is used.

Now let’s talk some more on the keyring_file. When I was developing the keyring_file, the concern was also how to be sure that the keyring_file was not changed under the running server. In 5.7, the check is done based on file stats, which is not a perfect solution and this solution was replaced in 8.0 with SHA256 checksum.

When keyring_file is first started, the file stats and checksum are calculated and remembered by the server, and the changes are only applied if those match. Of course, the checksum is updated as the file gets updated.

We have covered lots of ground on keyrings so far. There is one more important topic, though, that is often forgotten or misunderstood – the per-server separation of keyrings, and why this is essential. What do I mean by that? I mean that each server (let’s say Percona Server) in a cluster should have a separate place on the Vault server where Percona Server should store its keys. Master Keys stored in the keyring have each Percona Server’s GUID embedded into their ids. Why is this important? Imagine you have one Vault Server with keys, and all of the Percona Servers in your cluster are using this one Vault server. The problem seems obvious – if all of the Percona Servers were using Master Keys without unique ids – for instance, id = 1, id = 2, etc. – all the Percona servers in the cluster would be using the same Master Key. What the GUID provides is this per-server separation. Why talk about the per-server separation of keyrings, since there is already a separation with the unique GUID per Percona server? Well, there is one more plugin, keyring_udf. With this plugin, a user of your server can store their own keys inside the Vault server. The problem arises when your user creates a key on, let’s say server1, and then attempts to create a key with the same identifier (key_id) on server2, like this:

--server1:
select keyring_key_store('ROB_1','AES',"123456789012345");
1
--1 means success
--server2:
select keyring_key_store('ROB_1','AES',"543210987654321");
1

Wait. What!? Since both servers use the same Vault server, should not the keyring_key_store fail on the server2? Interesting enough, if you try to do the same on just one server, you will get a failure:

--server1:
select keyring_key_store('ROB_1','AES',"123456789012345");
1
select keyring_key_store('ROB_1','AES',"543210987654321");
0

Right, ROB_1 already exists.

Let’s discuss the second example first. As we discussed earlier – the keyring_vault or any other keyring plugin is caching all of the key ids in memory. So after the new key, ROB_1 is added on server 1 and apart from sending this key to Vault, the key is also added to the keyring’s cache. Now, when we try to add the same key for the second time, keyring_vault checks if this key already exists in the cache and will error out.

The story is different in the first example. Keyring on server1 has its own cache of the keys stored on the Vault server, and server2 has its own cache. After ROB_1 is added to the keyring’s cache on server1 and Vault server, the keyring’s cache on server2 is out of sync. The cache on server2 does not have the ROB_1 key; thus, writes to the keyring_key_store and writes ROB_1 to the Vault server which actually overrides (!) the previous value. Now the key ROB_1 on the Vault server is 543210987654321. Interesting enough, the Vault server does not block such actions and happily overrides the old value.

Now we see why this per-server separation on the Vault server can be significant – in case you allow the use of keyring_udf, and also if you want to store keys in order in your Vault. How can we ensure this separation on the Vault server?

There are two ways of separation on the Vault server. You can create mount points in the Vault server – a mount point per server, or you can use different paths inside the same mount point, with one path per server. It is best to explain those two approaches by examples. So let’s have a look at configuration files. First for mount point separation:

--server1:
vault_url = http://127.0.0.1:8200
secret_mount_point = server1_mount
token = (...)
vault_ca = (...)
--server2:
vault_url = http://127.0.0.1:8200
secret_mount_point = sever2_mount
token = (...)
vault_ca = (...)

We can see that we have server1 is using different mount point than server2. In a path separation the config files would look like the following:

--server1:
vault_url = http://127.0.0.1:8200
secret_mount_point = mount_point/server1
token = (...)
vault_ca = (...)
--server2:
vault_url = http://127.0.0.1:8200
secret_mount_point = mount_point/sever2
token = (...)
vault_ca = (...)

In this case, both servers are using the same secret mount point – the “mount_point,” but different paths. When you create the first secret on server1 in this path – the Vault server automatically creates a “server1” directory. The actions are the same for server2. When you remove the last secret in mount_point/server1 or mount_point/server2, then the Vault server removes these directories also. As we can see in case you use the path separation, you must create only one mount point and modify the configuration files to make servers use separate paths. The mount point can be created with an HTTP request. With CURL it’s:

curl -L -H "X-Vault-Token: TOKEN" –cacert VAULT_CA
--data '{"type":"generic"}' --request POST VAULT_URL/v1/sys/mounts/SECRET_MOUNT_POINT

All of the fields (TOKEN, VAULT_CA, VAULT_URL, SECRET_MOUNT_POINT) correspond to the options from the keyring configuration file. Of course, you can also use the vault binary to do the same. The point is that mount point creation can be automated. I hope you will find this information helpful, and we will see each other in the next blog post of this series.

Thanks,
Robert

↧

Open Source: What You Do Today Impacts Billions of People

December 10, 2019, 6:30 am

≫ Next: Webinar 12/19: Top 3 Features of MySQL

≪ Previous: MySQL Encryption: Talking About Keyrings

open source affects billions If you work in open source, we’re going to bet you may not know that what you do every day affects a billion people. Surprised? You shouldn’t be! In fact, that number is most likely even larger than that, given the popularity of open source software throughout industries across the globe.

Open source software and services are at the heart of so many applications and critical business systems that we often forget just how much of society depends on it each and every day. And some of these systems are life and death-related!

Life and Death in Open Source

Years ago I helped deploy MySQL at one of the first hospital systems to embrace open source software, putting it at the heart of their patient record transfer system. When someone arrives in the emergency room or is prepping for surgery, the system enables all the hospital records to show up so doctors have the information they need to save the patient. That’s a very powerful use case, as this software literally can make the difference between life and death.

Right now, Percona has many different medical and health-related companies as customers, and some of them are on a corporate mandate to move off of Oracle and Microsoft to MySQL. They anticipate they will add some 5000 applications to their already large MySQL footprint this year alone, and many of these applications are directly tied to patients getting the services they need. Now think about this: one hospital group has over 40 million users who use their systems to get medical insurance, help from providers, and ensure access to medical care. They also happen to employ some 200,000 employees worldwide. A single software bug, or some inaccurate advice, could impact up to 40 million people. We really are only one step away from impacting millions of people.

Let that sink in for a minute.

The Internet is Alive Because of Us

Let’s move outside of the medical field for a second, and talk about the internet in general. Scratch that; let’s talk about a specific company in a specific industry. This company was a B2C company that offered services directly to consumers.

A few years back, this company had an estimated 3500 employees and 30 million subscribers. But one day, they experienced a multi-day outage which was caused by a bad migration. But why the multi-day outage just from that?

They had no backups.

That’s right — a multi-million dollar operation missed setting up simple MySQL backups. Granted, this wasn’t a life or death situation like it would have been at a hospital, but millions of people (customers!) were inconvenienced. In addition, hundreds of people lost their jobs, which in turn, affected a lot of families and thousands of people. One small slip and boom — millions of people were affected.

Billions and Billions Served

If we take a look at the largest companies in the world, you can bet on the fact that they are running some form of open source software to manage their business. Currently, the largest one has 2,300,000 employees and $515 billion in revenue, with 275 million customers each and every week. We could do the math here on how just how many people a single line of code or a bug would affect, but it could get a little outrageous to keep track of, so let’s pick, say, number 40 on the list of the largest companies in the United States.

This company has 450,000 employees and 8.5 million daily customers at its 2,764 stores.

So let’s say this company is using MySQL to manage its customer, supply, and inventory databases. This would mean that not only are the company’s direct employees accessing and utilizing this software but so are its customers, vendors, transport companies, subsidiaries, and manufacturer/processing facilities.

Some quick math and estimates for this one company look like this:

Direct users:

450,000 employees + 8,500,000 customers + an estimated 250,000 “other” users = 9,200,000 people directly interact with MySQL at this company. Every day.

Dependents of those users:

If this company were to go out of business, or even just have a bad tech day, millions of people would be affected — even if they themselves didn’t work directly for the company. Using the above math, with 9,200,000 people directly using the software, let’s estimate that half of these people have a spouse and one child.

4,600,000 with a spouse and child = 13,800,000 people. Add in the 4,600,000 without a spouse or child and you get a grand total of 18,400,000 people each and every day depending on MySQL at this ONE company, just in the United States.

Every line of code, every bug you find and fix, every new feature optimized: what you work on every day in the open source software industry affects billions of people around the world. They depend on us to get it right, to keep their systems running efficiently and properly, and to be there when something goes wrong. If a single mid-sized company depends on MySQL to run properly for at least 18,400,000 people, imagine how many people and systems are actually depending on open source software for the same? Billions. And then some.

We Applaud You!

If you are one of the thousands who contribute to open source software, be proud! As the open-source software industry continues to grow, we should all be aware of how impactful the work we do is. We are literally impacting the health and well-being of those around us, and making the world better for billions of others.

↧

Webinar 12/19: Top 3 Features of MySQL

December 17, 2019, 6:55 am

≫ Next: Percona Backup for MongoDB in Action

≪ Previous: Open Source: What You Do Today Impacts Billions of People

MySQL has been ranked as the second most popular database since 2012 according to DB-Engines. Three features help it retain its top position: replication, storage engines, and NoSQL support. During this webinar, we’ll discuss the MySQL architecture surrounding these features and how to utilize their full power when your application hits performance or design limits.

This webinar is geared towards new MySQL users as well as those with other database administration experience. However, it’s also useful for experienced users looking to refresh their knowledge.

Please join Sveta Smirnova on Thurs, Dec 19, 10 to 11 am PST.

If you can’t attend, sign up anyways we’ll send you the slides and recording afterward.

↧

Percona Backup for MongoDB in Action

December 17, 2019, 8:01 am

≫ Next: Updated XtraDB Cluster, XtraBackup, Percona Servers, New Features in Kubernetes Operator: Release Roundup 12/17/2019

≪ Previous: Webinar 12/19: Top 3 Features of MySQL

Percona Backup for MongoDB in Action We recently released Percona Backup for MongoDB(PBM) as GA. It’s our open source tool for taking a consistent backup of a running standalone mongod instance, a Replica Set, or a Sharded Cluster as well. The following articles can give you an overview of the tool:

But now I would like to test it for real, so let’s see how it works for taking backups of a Replica Set.

Warning: Percona Backup for MongoDB supports Percona Server for MongoDB and MongoDB Community v3.6 and higher with MongoDB Replication enabled.

Architecture

At first, let’s briefly discuss the internals. Percona Backup for MongoDB consists of two actors: the pbm-agent and the pbm utility.

The pbm-agent is a process that has to be installed on each mongod node. The agent has a little footprint and it is responsible for various tasks. For example, to detect if a secondary node is the best candidate to do the backup or restore operations and coordinates with the other nodes.

The pbm CLI rules all the agents around and can be installed on any node with access to the MongoDB cluster. The following commands are available in the current 1.0.0 version:

Command	Description
store set (*)	Set up a backup store
store show (*)	Show the backup store associated with the active replica set.
backup	Make a backup
restore	Restore a backup
list	List the created backups

(*) these will become config –set / config — list in version 1.1

Installation

My test environment is the following:

3 nodes Replica Set with MongoDB 4.2 on AWS EC2 instances
OS: Ubuntu 18.04

The easiest and recommended way to install it is to use the official Percona repositories by using the percona-release utility. More details about Percona repositories and percona-release usage.

Enable the tools

percona-release enable tools

Install the package

apt update
apt install percona-backup-mongodb

The sample configuration files are placed into:

/etc/pbm-agent.conf
/etc/pbm-agent-storage.conf

You have to install the package on all the nodes of the replica set: pbm-agent must be installed on each node.

Storage

Backup data can be stored on Amazon S3 or compatible storage, such as MinIO. Storing backups on a local filesystem directory also works but isn’t a top recommendation as it requires all servers involved to be given mounts to the same remote backup server.

For running the backup and restore operations, we need to set up a place where the files will be stored and retrieved. In the current 1.0 pbm version, the only two available types of store are:

Amazon Simple Storage (S3)
MinIO: it is an Amazon Simple Storage Service compatible object storage.
local file system

We’ll use AWS S3 in our test. It’s the easier way at the moment, even for the recovery.

The storage details for pbm can be put into a configuration YAML file. The file contains all required options that pertain to one store. In 1.0, only Amazon Simple Storage Service-compatible remote stores are supported.

We can create the following file storage.yaml, specifying our own access keys:

type:s3
s3:
   region: us-west-2
   bucket: pbm-test-bucket-78967
   credentials:
      access-key-id: "your-access-key-id-here"
      secret-access-key: "your-secret-key-here"

To configure pbm to use this storage, execute the following command:

$ pbm store set --config=storage.yaml --mongodb-uri="mongodb://127.0.0.1:27017/"

You don’t need to supply the store information for any subsequent operations.

Run pbm-agent on the Nodes

Now it’s time to turn on the pbm-agent process on the nodes of the cluster. Let’s run the following command:

root@mdb1:~# pbm-agent --mongodb-uri mongodb://172.30.2.213:27017 &
[1] 3589
root@mdb1:~# pbm agent is listening for the commands

In —mongodb-uri we have to provide the connection string to the local mongod server.

For the sake of simplicity, I didn’t enable authentication, but in case you have it (as suggested for production environments), you also need to provide the username and password. As an example, you can specify the –mongodb-uri like as the following: mongodb://myuser:mysecretpwd@172.30.2.213:2017

After executing the same query on all the nodes we are ready to take the backup.

Take the Backup

Now it’s time to run our first backup. I have installed pbm on all the machines of the replica set, and we can use one of them. In a production environment with a lot of nodes, it is worth considering running it from another machine, but be sure that the machine can access mongod on the database nodes.

Let’s run the following command on the nodes and let’s see what’s happening. Please note that the following box shows log messages coming from the different nodes.

ubuntu@mdb2:~$ pbm backup --mongodb-uri mongodb://127.0.0.1:27017
Beginning backup '2019-12-12T11:21:10Z' to remote store s3://pbm-test-backups1
ubuntu@mdb2:~$ 2019/12/12 11:21:10 Got command backup 2019-12-12T11:21:10Z
2019/12/12 11:21:10 Backup 2019-12-12T11:21:10Z started on node corra/mdb2:27017
2019/12/12 11:21:18 Oplog started
2019-12-12T11:21:18.659+0000	writing admin.system.version to archive on stdout
2019-12-12T11:21:18.669+0000	done dumping admin.system.version (1 document)
2019-12-12T11:21:18.669+0000	writing admin.pbmCmd to archive on stdout
2019-12-12T11:21:18.671+0000	done dumping admin.pbmCmd (18 documents)
2019-12-12T11:21:18.672+0000	writing admin.pbmConfig to archive on stdout
2019-12-12T11:21:18.695+0000	done dumping admin.pbmConfig (1 document)
2019-12-12T11:21:18.695+0000	writing admin.pbmOp to archive on stdout
2019-12-12T11:21:18.698+0000	done dumping admin.pbmOp (1 document)
2019-12-12T11:21:18.698+0000	writing admin.pbmBackups to archive on stdout
2019-12-12T11:21:18.700+0000	done dumping admin.pbmBackups (1 document)
2019-12-12T11:21:18.701+0000	writing corra.retaurants to archive on stdout
2019-12-12T11:21:18.811+0000	done dumping corra.retaurants (3772 documents)
2019-12-12T11:21:18.811+0000	writing people-bson.people to archive on stdout
2019-12-12T11:21:44.949+0000	done dumping people-bson.people (1000000 documents)
2019/12/12 11:21:44 mongodump finished, waiting to finish oplog
2019/12/12 11:21:45 Backup 2019-12-12T11:21:10Z finished


ubuntu@mdb1:~$ 2019/12/12 11:21:10 Got command backup 2019-12-12T11:21:10Z
2019/12/12 11:21:10 Backup has been scheduled on another replset node


ubuntu@mdb3:~$ 2019/12/12 11:21:10 Got command backup 2019-12-12T11:21:10Z
2019/12/12 11:21:10 Node in not suitable for backup

We have launched the backup on mdb2, which is currently a SECONDARY. We can notice that at the same time all the pbm-agents were addressed and decided the best node to use for running the dump. It was mbd2. The dump of all the collections is taken using mongodump.

We can also notice from the log messages that mdb3 was not selected because it’s currently the PRIMARY.

The picture below shows what you should see on AWS S3 if the streaming of the dump files worked as expected. The json file contains metadata about the backup, the other two compressed ones contain the documents of the collections and the oplog collection.

After taking further backups you should see the following:

List the Backups

It’s not mandatory to use the S3 dashboard to see and manage the files, you can use pbm as well. To list all available backups you run the following:

ubuntu@mdb2:~$ pbm list --mongodb-uri mongodb://127.0.0.1:27017
Backup history:
  2019-12-12T11:21:10Z
  2019-12-12T11:41:27Z
  2019-12-12T11:43:35Z

Restore a Backup

At last, let’s test the recovery. As usual, you just need to use the pbm client.

For example, we would like to restore the first dump we have taken. We need simply to specify the right date and time as returned by the previous list command. That’s all.

ubuntu@mdb2:~$ pbm restore 2019-12-12T11:21:10Z --mongodb-uri mongodb://127.0.0.1:27017
Beginning restore of the snapshot from 2019-12-12T11:21:10Z

Let’s take a look at the log messages on all the nodes to analyze what’s happening.

ubuntu@mdb2:~$ 2019/12/12 11:51:50 Got command restore 2019-12-12T11:21:10Z
2019/12/12 11:51:50 Node in not suitable for restore

ubuntu@mdb1:~$ pbm 2019/12/12 11:51:50 Got command restore 2019-12-12T11:21:10Z
2019/12/12 11:51:50 Node in not suitable for restore

ubuntu@mdb3:~$ 2019/12/12 11:51:51 Got command restore 2019-12-12T11:21:10Z
2019/12/12 11:51:51 [INFO] Restore of '2019-12-12T11:21:10Z' started
2019-12-12T11:51:51.314+0000	preparing collections to restore from
2019-12-12T11:51:51.336+0000	reading metadata for corra.retaurants from archive on stdin
2019-12-12T11:51:51.380+0000	restoring corra.retaurants from archive on stdin
2019-12-12T11:51:53.300+0000	no indexes to restore
2019-12-12T11:51:53.300+0000	finished restoring corra.retaurants (3772 documents, 0 failures)
2019-12-12T11:51:53.304+0000	reading metadata for people-bson.people from archive on stdin
2019-12-12T11:51:53.348+0000	restoring people-bson.people from archive on stdin
2019-12-12T11:52:18.139+0000	no indexes to restore
2019-12-12T11:52:18.139+0000	finished restoring people-bson.people (1000000 documents, 0 failures)
2019/12/12 11:52:18 [INFO] Restore of '2019-12-12T11:21:10Z' finished successfully

The recovery runs on the PRIMARY node, mdb3. The SECONDARY reported they were not suitable for restore, as expected.

Warning: the instance that you will restore your backup to may already have data. After running pbm restore, the instance will have both its existing data and the data from the backup. To make sure that your data is consistent, either clean up the target instance or use an instance without data.

Conclusion

Percona Backup for MongoDB is a good and reliable tool for doing backups of any MongoDB deployment, even for a large Sharded Cluster. Anyway, there’s still a lot of work to do in order to add more features. As shown in the article, pbm is also quite easy to use.

In a future article, we’ll test pbm backup and recovery also on a larger sharded cluster. So, stay tuned for the next chapter and for the next versions of the tool.

Note: TLS does not work in v1.0 due to awaiting-release bug

Consider the links below to follow the development and new releases.

↧