Webinar 2/19: 17 Things Developers Need to Know About Databases

February 7, 2020, 6:23 am

≫ Next: How to Measure MySQL Performance in Kubernetes with Sysbench

≪ Previous: Observability Differences Between MySQL 8 and MariaDB 10.4

Most applications use databases, yet many fail to follow even the most basic best practices, resulting in poor performance, downtime, and security incidents.

In this presentation, we will look into the foundational best practices you as a Developer should know about databases, with particular focus on the most popular Open Source Databases – MySQL, PostgreSQL, and MongoDB.

Please join Percona CEO Peter Zaitsev on Wednesday, February 19, 2020, at 11 am EST for his webinar “17 Things Developers Need to Know About Databases”.

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

↧

How to Measure MySQL Performance in Kubernetes with Sysbench

February 7, 2020, 10:59 am

≫ Next: Introduction to MySQL 8.0 Common Table Expressions (Part 1)

≪ Previous: Webinar 2/19: 17 Things Developers Need to Know About Databases

MySQL Kubernetes Sysbench As our Percona Kubernetes Operator for Percona XtraDB Cluster gains in popularity, I am getting questions about its performance and how to measure it properly. Sysbench is the most popular tool for database performance evaluation, so let’s review how we can use it with Percona XtraDB Cluster Operator.

Operator Setup

I will assume that you have an operator running (if not, this is the topic for a different post). We have the documentation on how to get it going, and we will start a three-node cluster using the following cr.yaml file:

apiVersion: pxc.percona.com/v1-3-0
kind: PerconaXtraDBCluster
metadata:
  name: cluster1
  finalizers:
    - delete-pxc-pods-in-order
spec:
  secretsName: my-cluster-secrets
  sslSecretName: my-cluster-ssl
  sslInternalSecretName: my-cluster-ssl-internal
  allowUnsafeConfigurations: false
  pxc:
    size: 3
    image: percona/percona-xtradb-cluster-operator:1.3.0-pxc
    resources:
      requests:
        memory: 1G
        cpu: 600m
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    volumeSpec:
      emptyDir: {}
    gracePeriod: 600
  proxysql:
    enabled: false
    size: 3
    image: percona/percona-xtradb-cluster-operator:1.3.0-proxysql
    resources:
      requests:
        memory: 1G
        cpu: 600m
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    volumeSpec:
      persistentVolumeClaim:
        resources:
          requests:
            storage: 2Gi
    podDisruptionBudget:
      maxUnavailable: 1
    gracePeriod: 30
  pmm:
    enabled: false
    image: percona/percona-xtradb-cluster-operator:1.3.0-pmm
    serverHost: monitoring-service
    serverUser: pmm

If we are successful, we will have three pods running:

NAME             READY   STATUS    RESTARTS   AGE     IP               NODE     NOMINATED NODE   READINESS GATES
cluster1-pxc-0   1/1     Running   0          2m27s   192.168.139.65   node-3   <none>           <none>
cluster1-pxc-1   1/1     Running   0          95s     192.168.247.1    node-2   <none>           <none>
cluster1-pxc-2   1/1     Running   0          73s     192.168.84.130   node-1   <none>           <none>

It’s important to note that IP addresses allocated are internal to Kubernetes Pods and not routable outside of Kubernetes.

Sysbench on an External to Kubernetes Host

In this part, let’s assume we want to run a client (sysbench) on a separate host, which is not a part of the Kubernetes system. How do we do it? We need to expose one of the pods (or multiple) to the external world, and for this, we use Kubernetes service with type NodePort:

kubectl expose po cluster1-pxc-0 --type=NodePort
 
kubectl get svc
NAME           TYPE      CLUSTER-IP     EXTERNAL-IP   PORT(S)                                                AGE
cluster1-pxc-0 NodePort  10.104.69.70   <none>  3306:30160/TCP,4444:31045/TCP,4567:30671/TCP,4568:30029/TCP   8s

So here we see that port 3306 (MySQL port) is exposed as port 30160 on node-3 (node where pod cluster1-pxc-0 is running). Please note this will invoke a kube-proxy process on node-3, which will handle incoming traffic on port 30160 and route it to the cluster1-pxc-0 pod. Kube-proxy by itself will introduce some networking overhead.

To find the IP address of Node-3:

kubectl get nodes -o wide
NAME     STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   
node-3   Ready    <none>   29m   v1.17.2   147.75.56.103   <none>

So now we can connect the dots and connect the mysql client to IP 147.75.56.103 port 30160 and create database sbtest, which we need to run sysbench:

mysql -h147.75.56.103 -P30160 -uroot -proot_password
> create database sbtest;

And now we can prepare data for sysbench (nevermind some parameters, we will come to them later).

sysbench oltp_read_only --tables=10 --table_size=1000000  --mysql-host=147.75.56.103  --mysql-port=30160 --mysql-user=root --mysql-password=root_password --mysql-db=sbtest  prepare

Sysbench Running Inside Kubernetes

When we have sysbench running inside Kubernetes, it makes all these networking steps unnecessary and it simplifies a lot of things while also making one more complicated: how do you actually start a pod with sysbench?

For the start, we need an image with sysbench, and prudently we already have one in Docker Hub available as perconalab/sysbench, so we will use that one. And with an image you can prepare a yaml file to start a pod with kubectl create -f sysbench.yaml, or, I prefer to invoke it just from the command line (which is a little bit elaborate):

kubectl run -it --rm sysbench-client --image=perconalab/sysbench:latest --restart=Never -- bash

This way, Kubernetes will schedule sysbench-client pod on any available node, which may not be something we want. To schedule sysbench-client on a specific node, we can use:

kubectl run -it --rm sysbench-client --image=perconalab/sysbench:latest --restart=Never --overrides='{ "apiVersion": "v1", "spec": { "nodeSelector": { "kubernetes.io/hostname": "node-3" } } }' -- bash

This will start sysbench-client on node-3. Now from pod command line we can access mysql just using cluster1-pxc-0 hostname:

sysbench oltp_read_only --tables=10 --table_size=1000000  --mysql-host=cluster1-pxc-0 --mysql-user=root --mysql-password=root_password --mysql-db=sbtest  prepare

A Quick Intro to Sysbench

Although we have covered sysbench multiple times, I was asked to provide a basic intro for different scenarios, so I would like to review some basic options for sysbench.

Prepare Data

Before running a benchmark, we need to prepare the data. From our previous example:

sysbench oltp_read_only --tables=10 --table_size=1000000  --mysql-host=147.75.56.103  --mysql-port=30160 --mysql-user=root --mysql-password=root_password --mysql-db=sbtest  prepare

This will create ten tables with 1mln rows each, so it will generate data for ten tables, each about 250MB in size, for a total 2.5GB of data. This gives us an idea what knobs we can use to generate less or more data.

If we want, say, 25GB of data, we can use either 100 tables with 1mln rows each or ten tables with 10mln rows. For 50GB data, we can use 200 tables with 1mln rows or ten tables with 20mln rows, or any combination of tables and rows that will give 200mln rows in total.

Running Benchmark

Sysbench OLTP scenarios provides oltp_read_only and oltp_read_write scripts, where you can guess by the name – oltp_read_only will generate only SELECT queries, while oltp_read_write will generate SELECT, UPDATE, INSERT and DELETE queries.

Examples:

Read-only

sysbench oltp_read_only --tables=10 --table_size=1000000  --mysql-host=147.75.198.7 --mysql-port=32385 --mysql-user=root --mysql-password=root_password --mysql-db=sbtest --time=300 --threads=16 --report-interval=1 run

Read-write

sysbench oltp_read_write --tables=10 --table_size=1000000  --mysql-host=147.75.198.7 --mysql-port=32385 --mysql-user=root --mysql-password=root_password --mysql-db=sbtest --time=300 --threads=16 --report-interval=1 run

Parameters to Play

From our example, you can see some parameters you can play with:

–threads – how many user threads will connect to the database and generate queries. One will generate single-threaded load.
–time – for long to run a benchmark. It may vary from very short (60 sec or so) period to very long (hours and hours) if we want to see the stability of the long runs
–report-interval=1, how often to report results in progress. I often use one second to see the variance in the performance with one-sec resolution

Results interpretation

Running sysbench from one of the examples, you can see the following output:

[ 289s ] thds: 16 tps: 1872.97 qps: 37476.47 (r/w/o: 26237.63/6623.91/4614.93) lat (ms,95%): 10.09 err/s: 0.00 reconn/s: 0.00
[ 290s ] thds: 16 tps: 1913.93 qps: 38289.67 (r/w/o: 26808.07/6797.76/4683.84) lat (ms,95%): 9.73 err/s: 0.00 reconn/s: 0.00
[ 291s ] thds: 16 tps: 1562.75 qps: 31250.00 (r/w/o: 21874.50/5545.11/3830.39) lat (ms,95%): 23.95 err/s: 0.00 reconn/s: 0.00
[ 292s ] thds: 16 tps: 1817.99 qps: 36399.89 (r/w/o: 25473.92/6422.98/4502.99) lat (ms,95%): 11.24 err/s: 0.00 reconn/s: 0.00
[ 293s ] thds: 16 tps: 1632.31 qps: 32609.29 (r/w/o: 22832.40/5761.11/4015.77) lat (ms,95%): 24.38 err/s: 0.00 reconn/s: 0.00
[ 294s ] thds: 16 tps: 1917.99 qps: 38368.81 (r/w/o: 26857.87/6779.97/4730.98) lat (ms,95%): 9.56 err/s: 0.00 reconn/s: 0.00
[ 295s ] thds: 16 tps: 1744.97 qps: 34917.38 (r/w/o: 24441.56/6188.89/4286.92) lat (ms,95%): 13.46 err/s: 0.00 reconn/s: 0.00
[ 296s ] thds: 16 tps: 1913.02 qps: 38279.50 (r/w/o: 26790.35/6746.09/4743.06) lat (ms,95%): 9.91 err/s: 0.00 reconn/s: 0.00
[ 297s ] thds: 16 tps: 1723.01 qps: 34408.22 (r/w/o: 24086.16/6090.04/4232.03) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00
[ 298s ] thds: 16 tps: 1725.63 qps: 34530.62 (r/w/o: 24173.84/6105.70/4251.09) lat (ms,95%): 16.41 err/s: 0.00 reconn/s: 0.00
[ 299s ] thds: 16 tps: 1895.97 qps: 37925.41 (r/w/o: 26552.59/6658.90/4713.93) lat (ms,95%): 9.73 err/s: 0.00 reconn/s: 0.00
[ 300s ] thds: 16 tps: 1866.92 qps: 37358.45 (r/w/o: 26153.91/6589.73/4614.81) lat (ms,95%): 9.91 err/s: 0.00 reconn/s: 0.00
SQL statistics:
    queries performed:
        read:                            7307986
        write:                           1695485
        other:                           1436509
        total:                           10439980
    transactions:                        521999 (1739.67 per sec.)
    queries:                             10439980 (34793.42 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          300.0510s
    total number of events:              521999

Latency (ms):
         min:                                    3.52
         avg:                                    9.19
         max:                                  463.32
         95th percentile:                       15.83
         sum:                              4799073.43

Threads fairness:
    events (avg/stddev):           32624.9375/864.41
    execution time (avg/stddev):   299.9421/0.02

The first part is interval reports (one second, as we asked), and there we will see how many threads are running, and the most interesting part is “tps” and “lat” columns that report throughput and latency correspondingly for the given period of time.

In general, we want to see throughput higher and latency lower when we compare different experiments.

And the last part is the total statistics. The part we usually pay attention to is:

transactions:                        521999 (1739.67 per sec.)

And

Latency (ms):

The more transactions and smaller latency time typically corresponds to better performance.

↧

Introduction to MySQL 8.0 Common Table Expressions (Part 1)

February 10, 2020, 8:09 am

≫ Next: ProxySQL 1.4.16 and Updated proxysql-admin Tool Now Available

≪ Previous: How to Measure MySQL Performance in Kubernetes with Sysbench

MySQL Common Table Expressions This blog is the first part of a two-articles series. In this article, I’m going to introduce the Common Table Expression (CTE), a new feature available on MySQL 8.0, as well as Percona Server for MySQL 8.

What is a Common Table Expression?

We can define a CTE as an alternative to a derived table. In a small way, CTE simplifies complex joins and subqueries, improving the readability of the queries. CTE is part of ANSI SQL 99 and was introduced in MySQL 8.0.1. The same feature is available even on Percona Server for MySQL 8.0.

The main reasons for using CTE are:

Better readability of the queries
Can be referenced multiple times in the same query
Improved performance
A valid alternative to a VIEW, if your user cannot create VIEWs
Easier chaining of multiple CTE
Possibility to create recursive queries: this can be really useful when dealing with hierarchical data

SELECT, UPDATE, and DELETE statements can reference the CTE.

Note: for the examples in this article, I’ll use the world database you can download from the MySQL site.

How to Create and Use a CTE

We have said that a CTE is like a derived table when using a subquery, but the declaration is moved before the main query. A new dedicated clause is needed: WITH.

Let’s start with a subquery with a derived table to find the most populated countries in Europe:

mysql> SELECT Name, Population 
    -> FROM (SELECT * FROM country WHERE continent='Europe') AS derived_t 
    -> ORDER BY Population DESC LIMIT 5;
+--------------------+------------+
| Name               | Population |
+--------------------+------------+
| Russian Federation |  146934000 |
| Germany            |   82164700 |
| United Kingdom     |   59623400 |
| France             |   59225700 |
| Italy              |   57680000 |
+--------------------+------------+

Let’s rewrite it using the CTE instead:

mysql> WITH cte AS (SELECT * FROM country WHERE continent='Europe')
    -> SELECT Name, Population 
    -> FROM cte 
    -> ORDER BY Population DESC LIMIT 5;
+--------------------+------------+
| Name               | Population |
+--------------------+------------+
| Russian Federation |  146934000 |
| Germany            |   82164700 |
| United Kingdom     |   59623400 |
| France             |   59225700 |
| Italy              |   57680000 |
+--------------------+------------+

The syntax is quite simple. Before your query, using WITH, you can define the CTE or even multiple CTEs. After that, you can reference in the query all the CTEs as many times as you need. You can think about a CTE as a sort pre-materialized query available as a temporary table for the scope of your main query.

We can also specify the column names if a parenthesized list of names follows the CTE name:

mysql> WITH cte(eur_name, eur_population) AS (SELECT Name, Population FROM country WHERE continent='Europe')
    -> SELECT eur_name, eur_population
    -> FROM cte
    -> ORDER BY eur_opulation DESC LIMIT 5;
+--------------------+----------------+
| eur_name           | eur_population |
+--------------------+----------------+
| Russian Federation |      146934000 |
| Germany            |       82164700 |
| United Kingdom     |       59623400 |
| France             |       59225700 |
| Italy              |       57680000 |
+--------------------+----------------+

CTE can also be used as the data source for updating other tables as in the following examples:

# create a new table 
mysql> CREATE TABLE country_2020 ( Code char(3), Name char(52), Population_2020 int, PRIMARY KEY(Code) );
Query OK, 0 rows affected (0.10 sec)

# copy original data
mysql> INSERT INTO country_2020 SELECT Code, Name, Population FROM country;
Query OK, 239 rows affected (0.01 sec)
Records: 239  Duplicates: 0  Warnings: 0

# increase population of european countries by 10%
mysql> WITH cte(eur_code, eur_population) AS (SELECT Code, Population FROM country WHERE continent='Europe')  
    -> UPDATE country_2020, cte 
    -> SET Population_2020 = ROUND(eur_population*1.1) 
    -> WHERE Code=cte.eur_code;
Query OK, 46 rows affected (0.01 sec)
Rows matched: 46  Changed: 46  Warnings: 0

# delete from the new table all non-europian countries
mysql> WITH cte AS (SELECT Code FROM country WHERE continent <> 'Europe') 
    -> DELETE country_2020 
    -> FROM country_2020, cte 
    -> WHERE country_2020.Code=cte.Code;
Query OK, 193 rows affected (0.02 sec)

mysql> SELECT * FROM country_2020 ORDER BY Population_2020 DESC LIMIT 5;
+------+--------------------+-----------------+
| Code | Name               | Population_2020 |
+------+--------------------+-----------------+
| RUS  | Russian Federation |       161627400 |
| DEU  | Germany            |        90381170 |
| GBR  | United Kingdom     |        65585740 |
| FRA  | France             |        65148270 |
| ITA  | Italy              |        63448000 |
+------+--------------------+-----------------+

CTE can also be used for INSERT … SELECT queries like the following:

mysql> CREATE TABLE largest_countries (Code char(3), Name char(52), SurfaceArea decimal(10,2), PRIMARY KEY(Code) );
Query OK, 0 rows affected (0.08 sec)

mysql> INSERT INTO largest_countries
    -> WITH cte AS (SELECT Code, Name, SurfaceArea FROM country ORDER BY SurfaceArea DESC LIMIT 10)
    -> SELECT * FROM cte;
Query OK, 10 rows affected (0.02 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM largest_countries;
+------+--------------------+-------------+
| Code | Name               | SurfaceArea |
+------+--------------------+-------------+
| ARG  | Argentina          |  2780400.00 |
| ATA  | Antarctica         | 13120000.00 |
| AUS  | Australia          |  7741220.00 |
| BRA  | Brazil             |  8547403.00 |
| CAN  | Canada             |  9970610.00 |
| CHN  | China              |  9572900.00 |
| IND  | India              |  3287263.00 |
| KAZ  | Kazakstan          |  2724900.00 |
| RUS  | Russian Federation | 17075400.00 |
| USA  | United States      |  9363520.00 |
+------+--------------------+-------------+
10 rows in set (0.00 sec)

Let’s think about a CTE like a temporary table pre-calculated or materialized before the main query. Then you can reference that temporary table as many times you need in your query.

Also, you can create multiple CTEs, and all of them can be used in the main query. The syntax is like the following:

WITH cte1 AS (SELECT ... FROM ... WHERE ...),
     cte2 AS (SELECT ... FROM ... WHERE ...)
SELECT ...
FROM table1, table1, cte1, cte2 ....
WHERE .....

Scope of CTE

A CTE can be used even in subqueries, but in such a case, be aware of the scope. The CTE exists for the scope of a single statement.

Consider the following valid queries:

WITH cte AS (SELECT Code FROM country WHERE Population<1000000)
SELECT * FROM city WHERE city.CountryCode IN 
(SELECT Code FROM cte);   # Scope: "cte" is visible to top SELECT

SELECT * FROM city WHERE city.CountryCode IN 
(WITH cte AS (SELECT Code FROM country WHERE Population<1000000)
SELECT Code from cte);   # Scope: "cte" is not visible to top SELECT

To avoid any trouble with the scope, the best way to use CTEs is by creating all of them at the beginning of the top query. This way, all CTEs can be used wherever you need, multiple times.

Chaining multiple CTEs

When creating multiple CTEs for a query, another interesting feature is the chaining. It is possible to define any CTE containing one or more references to previous CTEs in the chain.

The following example shows how the chaining can be used. We would like to find out the countries with the highest and lowest population density in the world. We create a chain of three CTEs. The last two contain a reference to the first one.

mysql> WITH density_by_country(country,density) AS 
       (SELECT Name, Population/SurfaceArea 
       FROM country 
       WHERE Population>0 and surfacearea>0), 
       max_density(country,maxdensity,label) AS 
       (SELECT country, density, 'max density' 
       FROM density_by_country 
       WHERE density=(SELECT MAX(density) FROM density_by_country)), 
       min_density(country,mindensity,label) AS 
       (SELECT country, density, 'min density' 
       FROM density_by_country 
       WHERE density=(SELECT MIN(density) FROM density_by_country)) 
       SELECT * FROM max_density UNION ALL SELECT * FROM min_density;
+-----------+------------+-------------+
| country   | maxdensity | label       |
+-----------+------------+-------------+
| Macao     | 26277.7778 | max density |
| Greenland | 0.0259     | min density |
+-----------+------------+-------------+

Let’s think now how you can rewrite the same query using derived tables instead. You would need to copy (several times) the definition of density_by_country. The final query would be really very large and probably less readable.

Use CTE Instead of VIEW

It could happen that your database user doesn’t have the right to create a VIEW. A CTE doesn’t require specific grants apart from the capability to read table and columns, the same for any regular SELECT.

Then you can use CTE instead of a VIEW. Apart from the grants options, a CTE has generally better performance than a VIEW, as we’ll show later.

Let’s create a view and run a query using it.

mysql> CREATE VIEW city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city GROUP BY countrycode);

mysql> SELECT name, city_pop_by_country.sum_population/country.population ratio 
       FROM country, city_pop_by_country 
       WHERE country.code=city_pop_by_country.countrycode 
         AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+--------------------+--------+
| name               | ratio  |
+--------------------+--------+
| Bangladesh         | 0.0664 |
| Brazil             | 0.5048 |
| China              | 0.1377 |
| Germany            | 0.3194 |
| Egypt              | 0.2933 |
| Ethiopia           | 0.0510 |
| Indonesia          | 0.1767 |
| India              | 0.1216 |
| Iran               | 0.3845 |
| Japan              | 0.6153 |
| Mexico             | 0.6043 |
| Nigeria            | 0.1557 |
| Pakistan           | 0.2016 |
| Philippines        | 0.4072 |
| Russian Federation | 0.4706 |
| Turkey             | 0.4254 |
| United States      | 0.2825 |
| Vietnam            | 0.1173 |
+--------------------+--------+

# here is the explain of the query using the VIEW
mysql> EXPLAIN SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, city_pop_by_country WHERE country.code=city_pop_by_country.countrycode AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref                | rows | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
|  1 | PRIMARY     | country    | NULL       | ALL   | PRIMARY       | NULL        | NULL    | NULL               |  239 |    33.33 | Using where |
|  1 | PRIMARY     | <derived3> | NULL       | ref   | <auto_key0>   | <auto_key0> | 12      | world.country.Code |   16 |   100.00 | NULL        |
|  3 | DERIVED     | city       | NULL       | index | CountryCode   | CountryCode | 12      | NULL               | 4046 |   100.00 | NULL        |
|  2 | SUBQUERY    | <derived4> | NULL       | ALL   | NULL          | NULL        | NULL    | NULL               | 4046 |   100.00 | NULL        |
|  4 | DERIVED     | city       | NULL       | index | CountryCode   | CountryCode | 12      | NULL               | 4046 |   100.00 | NULL        |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+

Let’s try now to rewrite the same query using the CTE.

mysql> WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city GROUP BY countrycode)
       SELECT name, city_pop_by_country.sum_population/country.population ratio 
       FROM country, city_pop_by_country 
       WHERE country.code=city_pop_by_country.countrycode 
         AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+--------------------+--------+
| name               | ratio  |
+--------------------+--------+
| Bangladesh         | 0.0664 |
| Brazil             | 0.5048 |
| China              | 0.1377 |
| Germany            | 0.3194 |
| Egypt              | 0.2933 |
| Ethiopia           | 0.0510 |
| Indonesia          | 0.1767 |
| India              | 0.1216 |
| Iran               | 0.3845 |
| Japan              | 0.6153 |
| Mexico             | 0.6043 |
| Nigeria            | 0.1557 |
| Pakistan           | 0.2016 |
| Philippines        | 0.4072 |
| Russian Federation | 0.4706 |
| Turkey             | 0.4254 |
| United States      | 0.2825 |
| Vietnam            | 0.1173 |
+--------------------+--------+

# here is the EXPLAIN of the query using CTE
mysql> EXPLAIN WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city GROUP BY countrycode)
SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, city_pop_by_country WHERE country.code=city_pop_by_country.countrycode AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref                | rows | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
|  1 | PRIMARY     | country    | NULL       | ALL   | PRIMARY       | NULL        | NULL    | NULL               |  239 |    33.33 | Using where |
|  1 | PRIMARY     | <derived2> | NULL       | ref   | <auto_key0>   | <auto_key0> | 12      | world.country.Code |   16 |   100.00 | NULL        |
|  3 | SUBQUERY    | <derived2> | NULL       | ALL   | NULL          | NULL        | NULL    | NULL               | 4046 |   100.00 | NULL        |
|  2 | DERIVED     | city       | NULL       | index | CountryCode   | CountryCode | 12      | NULL               | 4046 |   100.00 | NULL        |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+

Taking a look at the two execution plans, we can notice that with the query, there are DERIVED stages. The materialization of the VIEW is needed more times, anytime the view is referenced.

The sample database is small, but we can enable the profiling to compare the execution time of the two queries.

mysql> SET profiling=1;

# execute the queries several times

mysql> show profiles;
+----------+------------+-----------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                         |
+----------+------------+-----------------------------------------------------------------------------------------------+
...
...                                                                                                                                                                                                                                                                                             |
|       35 | 0.00971925 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       36 | 0.00963100 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       37 | 0.00976900 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       38 | 0.00963875 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       39 | 0.00971200 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       40 | 0.00546550 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       41 | 0.00546975 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       42 | 0.00550325 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       43 | 0.00548000 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       44 | 0.00545675 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
+----------+------------+-----------------------------------------------------------------------------------------------+

The execution time of the query with the view is around 0.0097 seconds, while with CTE, it is around 0.0054 seconds. So, the CTE is faster. Using larger tables and having more references of the view means the difference between the queries will be more relevant.

Using CTE instead of a VIEW is more efficient because only a single materialization is needed, and the temporary table created can be referenced many times in the main query.

Conclusion

We have introduced the new Common Table Expression feature available on MySQL 8.0. Using CTE, you can simplify, in most cases, the readability of the queries, but you can also use CTE instead of VIEWs to improve the overall performance.

Using CTE, it is also possible to create a recursive query. In the next article of this series, we’ll see examples about how to use recursive CTE to generate series or to query hierarchies.

↧

ProxySQL 1.4.16 and Updated proxysql-admin Tool Now Available

February 11, 2020, 4:39 am

≫ Next: Percona Monitoring and Management 2 Now Available on AWS Marketplace

≪ Previous: Introduction to MySQL 8.0 Common Table Expressions (Part 1)

ProxySQL 1.14.16 ProxySQL 1.4.16, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL, and database servers in the MySQL ecosystem (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.16 source and binary packages available from the Percona download page for ProxySQL include ProxySQL Admin – a tool developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.16 are available as well. You can download the original ProxySQL from GitHub. GitHub hosts the documentation in the wiki format.

Bugs Fixed

PSQLADM-219: The scheduler was handling the pxc_maint_mode variable incorrectly. As a result, open connections were closed immediately. This bug has been fixed and now the scheduler only sets the node status to OFFLINE_SOFT. This prevents opening new connections and lets the already established connections finish their work. It is up to the user to decide when it is safe to start the node maintenance.

ProxySQL is available under Open Source license GPLv3.

↧

Percona Monitoring and Management 2 Now Available on AWS Marketplace

February 11, 2020, 8:22 am

≫ Next: Compression of PostgreSQL WAL Archives Becoming More Important

≪ Previous: ProxySQL 1.4.16 and Updated proxysql-admin Tool Now Available

We recently released Percona Monitoring and Management (PMM) version 2.2 and as part of this release, we made it available on the AWS Marketplace. In this blog post, I’ll explain how to find PMM in the AWS marketplace and how to install it. It is important to note that PMM is presented as an AMI image providing native support, so you can install PMM server in your AWS environment directly from the AWS Marketplace, and there is no need to create additional EC2 instances to run docker.

To install PMM on AWS you need to search for PMM in AWS Marketplace or go directly to the PMM Server page.

This will get you to the main PMM Server page:

By pressing “Continue to Subscribe ” (the big yellow button) you’ll proceed to the next step. Here you need to select the proper version of PMM Server you would like to run. If you have deployed PMM previously, you may have an earlier version selected by default. So, it is important to ensure you pick the latest version.

Once you’ve selected PMM 2 version, proceed by pressing the “Continue to Launch” button.

On the next screen, you’ll see the confirmation details of the instance deployment and link to the EC2 console.

When you open the EC2 Console, you should see the status of the instance. Click on your instance, find IP and open it in a browser.

Note: It depends on your default and selected Security group and other settings on how you’ll be able to access your instance. If you can’t open your PMM Server by IP in the browser, contact your IT or Infrastructure Team to help with AWS EC2 settings

On the EC2 console page, you also need to copy the Instance ID.

Due to AWS security restrictions, the only person who can access the EC2 console can access products from AWS Marketplace the first time.

After opening PMM in the browser you’ll see a verification screen where you need to enter the Instance ID.

Once that is done, you’ll have a typical PMM Server usage experience.

We strongly recommend you change the default password from the first login to make your PMM Server instance more secure.

For more detailed and technical information please visit the Running PMM Server Using AWS Marketplace page on Percona.com.

Companies are increasingly embracing database automation and the advantages offered by the cloud. Our new white paper discusses common database scenarios and the true cost of downtime to your business, including the potential losses that companies can incur without a well-configured database and infrastructure setup.

Download “The Hidden Costs of Not Properly Managing Your Databases”

↧

Compression of PostgreSQL WAL Archives Becoming More Important

February 13, 2020, 6:54 am

≫ Next: Introduction to MySQL 8.0 Recursive Common Table Expression (Part 2)

≪ Previous: Percona Monitoring and Management 2 Now Available on AWS Marketplace

compression of postgresql wal archives As hardware and software evolve, the bottlenecks in a database system also shift. Many old problems might disappear and new types of problems pop-up.

Old Limitations

There were days when CPU and Memory was a limitation. More than a decade back, servers with 4 cores were “High End” and as a DBA, my biggest worry was managing the available resources. And for an old DBA like me, Oracle’s attempt to pool CPU and Memory from multiple host machines for a single database using RAC architecture was a great attempt to solve it.

Then came the days of storage speed limitations. It was triggered by the emergence of multi-core with multi-thread processors becoming common, as well as memory size and bus speed increasing. Enterprises tried to solve it with sophisticated SAN drives, Specialized Storages with cache, etc. But it has remained for many years, even now as enterprises started increasingly shifting to NVMe drives.

Recently we started observing a new bottleneck which is becoming a pain point for many database users. As the capability of the single-host server increased, it started processing a huge number of transactions. There are systems that produce thousands of WAL files in a couple of minutes, and there were a few cases reported where WAL archiving to a cheaper, slower disk system was not able to catch up with WAL generation. To add more complexity, many organizations prefer to store WAL archives over a low bandwidth network. (There is an inherent problem in Postgres Archiving that if it lags behind, it tends to lag more because the archive process needs to search among .ready files. which won’t be discussed here.)

In this blog post, I would like to bring to your attention the fact that compressing WALs can be easily achieved if you are not already doing it, as well as a query to monitor the archiving gap.

Compressing PostgreSQL WALs

The demands and requirements for compressing WALs before archiving are increasing day by day. Luckily, most of the PostgreSQL backup tools like pgbackrest/wal-g etc already take care of it. The

archive_command

invokes these tools, silently archiving for users.

For example, in pg_backrest, we can specify archive_command, which uses the gzip behind the scene.

ALTER SYSTEM SET archive_command = 'pgbackrest --stanza=mystanza archive-push %p';

Or in WAL-G, we can specify:

ALTER SYSTEM SET archive_command = 'WALG_FILE_PREFIX=/path/to/archive /usr/local/bin/wal-g wal-push  %p';

This does the lz4 compression of WAL files.

But what if we are not using any specific backup tool for WAL compression for archiving? We can still compress the WALs using Linux tools like gzip or bzip, etc. Gzip will be available in most of the Linux installations by default, so configuring it will be an easy task.

alter system set archive_command = '/usr/bin/gzip -c %p > /home/postgres/archived/%f.gz';

However, 7za is the most interesting among all the compression options for WAL, which gives the highest compression as fast as possible, which is the major criterion in a system with high WAL generation. You may have to explicitly install the 7za, which is part of the 7zip package from an extra repo.

On CentOS 7 it is:

sudo yum install epel-release
sudo yum install p7zip

On Ubuntu it is:

sudo apt install p7zip-full

Now we should be able to specify the archive_command like this:

postgres=# alter system set archive_command = '7za a -bd -mx2 -bsp0 -bso0 /home/postgres/archived/%f.7z %p';
ALTER SYSTEM

In my test system, I could see archived WAL files of less than 200kb. Size can vary according to the content of the WALs, which depends on the type of transaction on the database.

-rw-------. 1 postgres postgres 197K Feb  6 12:13 0000000100000000000000AA.7z
-rw-------. 1 postgres postgres 197K Feb  6 12:13 0000000100000000000000AB.7z
-rw-------. 1 postgres postgres 198K Feb  6 12:13 0000000100000000000000AC.7z
-rw-------. 1 postgres postgres 196K Feb  6 12:13 0000000100000000000000AD.7z
-rw-------. 1 postgres postgres 197K Feb  6 12:13 0000000100000000000000AE.7z

Compressing 16MB files to kilobyte rages is definitely going to save network bandwidth and storage while addressing the problem of archiving falling behind.

Restoring the WALs

Archiving and getting the highest compression is just one part, but we should also be able to restore them when required. The backup tools provide their own restore command options. For example, pgbackrest can use

archive-get

restore_command = 'pgbackrest --stanza=demo archive-get %f "%p"'

Wal-g provides

wal-fetch

for the same purpose.

In case you are opting for manual archive compression using gzip, we can use the gunzip utility in restore_command as follows:

gunzip -c /home/postgres/archived/%f.gz > %p

If you are already started using PostgreSQL 12, this parameter can be set using ALTER SYSTEM:

postgres=# alter system set restore_command = 'gunzip -c /home/postgres/archived/%f.gz > %p';
ALTER SYSTEM

For 7za as shown above, you may use the following:

postgres=# alter system set restore_command = '7za x -so /home/postgres/archived/%f.7z > %p';
ALTER SYSTEM

However, unlike archive_command changes, restore_command changes require you to restart the standby database.

Monitoring Archive Progress

The current WAL archive is available from pg_stat_archiver status, but finding out the gap using the WAL file names is a bit tricky. A sample query which I used to find out the WAL archive lagging is this:

select pg_walfile_name(pg_current_wal_lsn()),last_archived_wal,last_failed_wal, 
  ('x'||substring(pg_walfile_name(pg_current_wal_lsn()),9,8))::bit(32)::int*256 + 
  ('x'||substring(pg_walfile_name(pg_current_wal_lsn()),17))::bit(32)::int  -
  ('x'||substring(last_archived_wal,9,8))::bit(32)::int*256 -
  ('x'||substring(last_archived_wal,17))::bit(32)::int
  as diff from pg_stat_archiver;

The caveat here is that both current WAL and the WAL to be archived are of the same timeline in order for this query to work, which is the common case. Very rarely we may encounter a different case than that in production. So this query could be of good help when monitoring the WAL archiving of a PostgreSQL server.

Learn more about the Percona Distribution for PostgreSQL.

Our white paper “Why Choose PostgreSQL?” looks at the features and benefits of PostgreSQL and presents some practical usage examples. We also examine how PostgreSQL can be useful for companies looking to migrate from Oracle.

Download PDF

↧

Introduction to MySQL 8.0 Recursive Common Table Expression (Part 2)

February 13, 2020, 9:12 am

≫ Next: How to Run Orchestrator on FreeBSD

≪ Previous: Compression of PostgreSQL WAL Archives Becoming More Important

MySQL 8.0 Recursive Common Table Expressions This is the second part of a two-articles series. In the first part, we introduced the Common Table Expression (CTE), a new feature available on MySQL 8.0 as well as Percona Server for MySQL 8.0. In this article, we’ll present the Recursive Common Table Expression. SQL is generally poor at recursive structures, but it is now possible on MySQL to write recursive queries. Before MySQL 8.0, recursion was possible only by creating stored routines.

What is a Recursive Common Table Expression?

A recursive CTE is one having a subquery that refers to its own name. It is particularly useful in the following cases:

To generate series
Hierarchical or tree-structured data traversal

Let’s see the main components of a recursive CTE. The following is the syntax to create it:

WITH RECURSIVE cte AS (
   initial_query    -- "seed" member
   UNION ALL
   recursive_query    -- recusive member that references to the same CTE name
)
SELECT * FROM cte;    -- main query

First of all, the clause RECURSIVE is mandatory, and then there are two mandatory components. The seed member is the initial query, the one that will be executed at the first iteration. The recursive member is the query containing the reference to the same CTE name. This second component will generate all the remaining items of the main query.

The process stops when an iteration does not generate any rows. Be aware of that in order to avoid generating a lot of iterations that can exhaust the memory.

It is important for recursive CTEs that the recursive member includes a condition to terminate the recursion. As a development technique you can force termination by placing a limit on execution time:

The cte_max_recursion_depth system variable enforces a limit on the number of recursion levels for CTEs. The server terminates the execution of any CTE that recurses more levels than the value of this variable. The default value is 1000.
The max_execution_time system variable enforces an execution timeout for SELECT statements executed within the current session.
The MAX_EXECUTION_TIME optimizer hint enforces a per-query execution timeout for the SELECT statement in which it appears.

Generate Series

Let’s see now some simple usage of Recursive CTE to generate series.

One-Level Sequence

First, create a simple series of integer numbers from 1 to 10. This a one-level sequence because the N+1 value is a function of the previous one N only.

WITH RECURSIVE natural_sequence AS
  ( SELECT 1 AS n       -- seed member: our sequence starts from 1
    UNION ALL
    SELECT n + 1 FROM natural_sequence    -- recursive member: reference to itself
    WHERE n < 10                          -- stop condition
  )
SELECT * FROM natural_sequence;           -- main query
+------+
| n    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
|    5 |
|    6 |
|    7 |
|    8 |
|    9 |
|   10 |
+------+

# let's see what happen if we miss the stop condition
mysql> WITH RECURSIVE natural_sequence AS ( SELECT 1 AS n   UNION ALL SELECT n + 1 FROM natural_sequence   ) SELECT * FROM natural_sequence;
ERROR 3636 (HY000): Recursive query aborted after 1001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.

Another typical example is calculating the factorial.

mysql> WITH RECURSIVE factorial(n, fact) AS ( 
          SELECT 0, 1 
          UNION ALL  
          SELECT n + 1, fact * (n+1)  
          FROM factorial 
          WHERE n < 20 ) 
       SELECT * from factorial;
+------+---------------------+
| n    | fact                |
+------+---------------------+
|    0 |                   1 |
|    1 |                   1 |
|    2 |                   2 |
|    3 |                   6 |
|    4 |                  24 |
|    5 |                 120 |
|    6 |                 720 |
|    7 |                5040 |
|    8 |               40320 |
|    9 |              362880 |
|   10 |             3628800 |
|   11 |            39916800 |
|   12 |           479001600 |
|   13 |          6227020800 |
|   14 |         87178291200 |
|   15 |       1307674368000 |
|   16 |      20922789888000 |
|   17 |     355687428096000 |
|   18 |    6402373705728000 |
|   19 |  121645100408832000 |
|   20 | 2432902008176640000 |
+------+---------------------+

Two-Level Sequence

In this case, we would like to create a two-level sequence where the N+2 value is a function of the two previous values N+1 and N.

The typical example here is the Fibonacci Series; each number is the sum of the two preceding ones, starting from 0 and 1. Let’s calculate the first 20 items of the Fibonacci series.

mysql> WITH RECURSIVE fibonacci (n, fib_n, next_fib_n) AS (   
          SELECT 1, 0, 1   
          UNION ALL   
          SELECT n + 1, next_fib_n, fib_n + next_fib_n     
          FROM fibonacci 
          WHERE n < 20 ) 
       SELECT * FROM fibonacci;
+------+-------+------------+
| n    | fib_n | next_fib_n |
+------+-------+------------+
|    1 |     0 |          1 |
|    2 |     1 |          1 |
|    3 |     1 |          2 |
|    4 |     2 |          3 |
|    5 |     3 |          5 |
|    6 |     5 |          8 |
|    7 |     8 |         13 |
|    8 |    13 |         21 |
|    9 |    21 |         34 |
|   10 |    34 |         55 |
|   11 |    55 |         89 |
|   12 |    89 |        144 |
|   13 |   144 |        233 |
|   14 |   233 |        377 |
|   15 |   377 |        610 |
|   16 |   610 |        987 |
|   17 |   987 |       1597 |
|   18 |  1597 |       2584 |
|   19 |  2584 |       4181 |
|   20 |  4181 |       6765 |
+------+-------+------------+

Date Sequence

Let’s consider having a simple table containing our shop’s sales such as the following:

CREATE TABLE sales (
id INT AUTO_INCREMENT PRIMARY KEY,
order_date DATE,
product VARCHAR(20),
price DECIMAL(10,2));

# populate the table
INSERT INTO sales(order_date, product, price) 
VALUES('2020-02-01','DVD PLAYER',100.50),('2020-02-01','TV',399.99),('2020-02-02','LAPTOP',1249.00),
('2020-02-04','DISHWASHER',500.00),('2020-02-04','TV',699.00),('2020-02-06','LAPTOP',990.50),('2020-02-06','HAIRDRYER',29.90),
('2020-02-06','GAME CONSOLE',299.00),('2020-02-08','BOOK',9.00),('2020-02-08','REFRIGERATOR',600.00);

# let's run a query to generate the sales report by day
SELECT order_date, SUM(price) AS sales
FROM sales 
GROUP BY order_date;
+------------+---------+
| order_date | sales   |
+------------+---------+
| 2020-02-01 |  500.49 |
| 2020-02-02 | 1249.00 |
| 2020-02-04 | 1199.00 |
| 2020-02-06 | 1319.40 |
| 2020-02-07 |  609.00 |
+------------+---------+

Notice, however, that our sales report has missing dates: Feb 3rd and Feb 5th. We would like to generate a report including even the dates with no sales.

A recursive CTE can help.

WITH RECURSIVE dates(date) AS (
   SELECT '2020-02-01' 
   UNION ALL
   SELECT date + INTERVAL 1 DAY 
   FROM dates
   WHERE date < '2020-02-07' )
SELECT dates.date, COALESCE(SUM(price), 0) sales
FROM dates LEFT JOIN sales ON dates.date = sales.order_date
GROUP BY dates.date;
+------------+---------+
| date       | sales   |
+------------+---------+
| 2020-02-01 |  500.49 |
| 2020-02-02 | 1249.00 |
| 2020-02-03 |    0.00 |
| 2020-02-04 | 1199.00 |
| 2020-02-05 |    0.00 |
| 2020-02-06 | 1319.40 |
| 2020-02-07 |  609.00 |
+------------+---------+

Hierarchical Data Traversal

Let’s take a look now at some other use cases for recursive CTE: a simple tree for an Org Chart, a more complex tree for family genealogy and a graph for train paths, of the following picture.

A Simple Tree: Org Chart

# create the table
CREATE TABLE orgchart(
id INT PRIMARY KEY,
name VARCHAR(20),
role VARCHAR(20),
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES orgchart(id));

# insert the rows
INSERT INTO orgchart VALUES(1,'Matthew','CEO',NULL), 
(2,'Caroline','CFO',1),(3,'Tom','CTO',1),
(4,'Sam','Treasurer',2),(5,'Ann','Controller',2),
(6,'Anthony','Dev Director',3),(7,'Lousie','Sys Admin',3),
(8,'Travis','Senior DBA',3),(9,'John','Developer',6),
(10,'Jennifer','Developer',6),(11,'Maria','Junior DBA',8);

# let's see the table, The CEO has no manager, so the manager_id is set to NULL
SELECT * FROM orgchat;
+----+----------+--------------+------------+
| id | name     | role         | manager_id |
+----+----------+--------------+------------+
|  1 | Matthew  | CEO          |       NULL |
|  2 | Caroline | CFO          |          1 |
|  3 | Tom      | CTO          |          1 |
|  4 | Sam      | Treasurer    |          2 |
|  5 | Ann      | Controller   |          2 |
|  6 | Anthony  | Dev Director |          3 |
|  7 | Lousie   | Sys Admin    |          3 |
|  8 | Travis   | Senior DBA   |          3 |
|  9 | John     | Developer    |          6 |
| 10 | Jennifer | Developer    |          6 |
| 11 | Maria    | Junior DBA   |          8 |
+----+----------+--------------+------------+

Let’s run some queries using recursive CTE to traverse this kind of hierarchy.

# find the reporting chain for all the employees
mysql> WITH RECURSIVE reporting_chain(id, name, path) AS ( 
          SELECT id, name, CAST(name AS CHAR(100))  
          FROM org_chart 
          WHERE manager_id IS NULL 
          UNION ALL 
          SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name) 
          FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) 
       SELECT * FROM reporting_chain;
+------+----------+---------------------------------------+
| id   | name     | path                                  |
+------+----------+---------------------------------------+
|    1 | Matthew  | Matthew                               |
|    2 | Caroline | Matthew -> Caroline                   |
|    3 | Tom      | Matthew -> Tom                        |
|    4 | Sam      | Matthew -> Caroline -> Sam            |
|    5 | Ann      | Matthew -> Caroline -> Ann            |
|    6 | Anthony  | Matthew -> Tom -> Anthony             |
|    7 | Lousie   | Matthew -> Tom -> Lousie              |
|    8 | Travis   | Matthew -> Tom -> Travis              |
|    9 | John     | Matthew -> Tom -> Anthony -> John     |
|   10 | Jennifer | Matthew -> Tom -> Anthony -> Jennifer |
|   11 | Maria    | Matthew -> Tom -> Travis -> Maria     |
+------+----------+---------------------------------------+

Please note the usage of the CAST function on the “seed” member of the CTE. This was done on purpose. Let’s look what happens in case you don’t use the CAST function:

mysql> WITH RECURSIVE reporting_chain(id, name, path) AS ( 
          SELECT id, name, name 
          FROM org_chart 
          WHERE manager_id IS NULL 
          UNION ALL 
          SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name) 
          FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) 
       SELECT * FROM reporting_chain;
ERROR 1406 (22001): Data too long for column 'path' at row 1

Why an error? The query is, in theory, correct, but the problem is that the type of column path is determined from the non-recursive SELECT only, and so it is CHAR(7) (Matthew length). On the recursive part of the CTE it would cause a character truncation, so: error!

Let’s look at a query to traverse the tree and calculate the level of the employees in the Org Chart.

mysql> WITH RECURSIVE reporting_chain(id, name, path, level) AS ( 
          SELECT id, name, CAST(name AS CHAR(100)), 1  
          FROM org_chart 
          WHERE manager_id IS NULL 
          UNION ALL 
          SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name), rc.level+1 
          FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) 
       SELECT * FROM reporting_chain ORDER BY level;
+------+----------+---------------------------------------+-------+
| id   | name     | path                                  | level |
+------+----------+---------------------------------------+-------+
|    1 | Matthew  | Matthew                               |     1 |
|    2 | Caroline | Matthew -> Caroline                   |     2 |
|    3 | Tom      | Matthew -> Tom                        |     2 |
|    4 | Sam      | Matthew -> Caroline -> Sam            |     3 |
|    5 | Ann      | Matthew -> Caroline -> Ann            |     3 |
|    6 | Anthony  | Matthew -> Tom -> Anthony             |     3 |
|    7 | Lousie   | Matthew -> Tom -> Lousie              |     3 |
|    8 | Travis   | Matthew -> Tom -> Travis              |     3 |
|    9 | John     | Matthew -> Tom -> Anthony -> John     |     4 |
|   10 | Jennifer | Matthew -> Tom -> Anthony -> Jennifer |     4 |
|   11 | Maria    | Matthew -> Tom -> Travis -> Maria     |     4 |
+------+----------+---------------------------------------+-------+

A More Complex Tree: Genealogy

Creating a table to represent the following genealogy with grandparents, parents, and sons.

CREATE TABLE genealogy(
id INT PRIMARY KEY,
name VARCHAR(20),
father_id INT,
mother_id INT,
FOREIGN KEY(father_id) REFERENCES genealogy(id),
FOREIGN KEY(mother_id) REFERENCES genealogy(id));

# populate the table
INSERT INTO genealogy VALUES(1,'Maria',NULL,NULL),
(2,'Tom',NULL,NULL),(3,'Robert',NULL,NULL),
(4,'Claire',NULL,NULL),(5,'John',2,1),
(6,'Jennifer',2,1),(7,'Sam',3,4),
(8,'James',7,6);

SELECT * FROM genealogy;
+----+----------+-----------+-----------+
| id | name     | father_id | mother_id |
+----+----------+-----------+-----------+
|  1 | Maria    |      NULL |      NULL |
|  2 | Tom      |      NULL |      NULL |
|  3 | Robert   |      NULL |      NULL |
|  4 | Claire   |      NULL |      NULL |
|  5 | John     |         2 |         1 |
|  6 | Jennifer |         2 |         1 |
|  7 | Sam      |         3 |         4 |
|  8 | James    |         7 |         6 |
+----+----------+-----------+-----------+

Let’s find all of James’s ancestors and the relationship:

mysql> WITH RECURSIVE ancestors AS ( 
          SELECT *, CAST('son' AS CHAR(20)) AS relationship, 0 level 
          FROM genealogy  
          WHERE name='James' 
          UNION ALL 
          SELECT g.*, CASE WHEN g.id=a.father_id AND level=0 THEN 'father' 
                           WHEN g.id=a.mother_id AND level=0 THEN 'mother' 
                           WHEN g.id=a.father_id AND level=1 THEN 'grandfather' 
                           WHEN g.id=a.mother_id AND level=1 THEN 'grandmother' 
                       END,
                       level+1 
           FROM genealogy g, ancestors a 
           WHERE g.id=a.father_id OR g.id=a.mother_id) 
        SELECT * FROM ancestors;
+------+----------+-----------+-----------+--------------+-------+
| id   | name     | father_id | mother_id | relationship | level |
+------+----------+-----------+-----------+--------------+-------+
|    8 | James    |         7 |         6 | son          |     0 |
|    6 | Jennifer |         2 |         1 | mother       |     1 |
|    7 | Sam      |         3 |         4 | father       |     1 |
|    1 | Maria    |      NULL |      NULL | grandmother  |     2 |
|    2 | Tom      |      NULL |      NULL | grandfather  |     2 |
|    3 | Robert   |      NULL |      NULL | grandfather  |     2 |
|    4 | Claire   |      NULL |      NULL | grandmother  |     2 |
+------+----------+-----------+-----------+--------------+-------+

Using the same query but changing the initial condition we can find out the ancestors of anyone in the hierarchy, for example, Jennifer:

mysql> WITH RECURSIVE ancestors AS ( 
          SELECT *, CAST('daughter' AS CHAR(20)) AS relationship, 0 level 
          FROM genealogy 
          WHERE name='Jennifer' 
          UNION ALL 
          SELECT g.*, CASE WHEN g.id=a.father_id AND level=0 THEN 'father' 
                           WHEN g.id=a.mother_id AND level=0 THEN 'mother' 
                           WHEN g.id=a.father_id AND level=1 THEN 'grandfather' 
                           WHEN g.id=a.mother_id AND level=1 THEN 'grandmother' 
                      END, 
                      level+1 
           FROM genealogy g, ancestors a 
           WHERE g.id=a.father_id OR g.id=a.mother_id) 
        SELECT * FROM ancestors;
+------+----------+-----------+-----------+--------------+-------+
| id   | name     | father_id | mother_id | relationship | level |
+------+----------+-----------+-----------+--------------+-------+
|    6 | Jennifer |         2 |         1 | daughter     |     0 |
|    1 | Maria    |      NULL |      NULL | mother       |     1 |
|    2 | Tom      |      NULL |      NULL | father       |     1 |
+------+----------+-----------+-----------+--------------+-------+

A Graph: Train Routes

Let’s create a graph representing train routes in Italy for the more important cities, from the image below:

Be aware of uni-directional and bi-directional connections. Each connection also has a distance in km.

CREATE TABLE train_route(
id INT PRIMARY KEY,
origin VARCHAR(20),
destination VARCHAR(20),
distance INT);

# populate the table
INSERT INTO train_route VALUES(1,'MILAN','TURIN',150),
(2,'TURIN','MILAN',150),(3,'MILAN','VENICE',250),
(4,'VENICE','MILAN',250),(5,'MILAN','GENOA',200),
(6,'MILAN','ROME',600),(7,'ROME','MILAN',600),
(8,'MILAN','FLORENCE',380),(9,'TURIN','GENOA',160),
(10,'GENOA','TURIN',160),(11,'FLORENCE','VENICE',550),
(12,'FLORENCE','ROME',220),(13,'ROME','FLORENCE',220),
(14,'GENOA','ROME',500),(15,'ROME','NAPLES',210),
(16,'NAPLES','VENICE',800);

SELECT * FROM train_route;
+----+----------+-------------+----------+
| id | origin   | destination | distance |
+----+----------+-------------+----------+
|  1 | MILAN    | TURIN       |      150 |
|  2 | TURIN    | MILAN       |      150 |
|  3 | MILAN    | VENICE      |      250 |
|  4 | VENICE   | MILAN       |      250 |
|  5 | MILAN    | GENOA       |      200 |
|  6 | MILAN    | ROME        |      600 |
|  7 | ROME     | MILAN       |      600 |
|  8 | MILAN    | FLORENCE    |      380 |
|  9 | TURIN    | GENOA       |      160 |
| 10 | GENOA    | TURIN       |      160 |
| 11 | FLORENCE | VENICE      |      550 |
| 12 | FLORENCE | ROME        |      220 |
| 13 | ROME     | FLORENCE    |      220 |
| 14 | GENOA    | ROME        |      500 |
| 15 | ROME     | NAPLES      |      210 |
| 16 | NAPLES   | VENICE      |      800 |
+----+----------+-------------+----------+

Returning all the train destinations with Milan as the origin:

mysql> WITH RECURSIVE train_destination AS ( 
          SELECT origin AS dest 
          FROM train_route 
          WHERE origin='MILAN'  
          UNION  
          SELECT tr.destination 
          FROM train_route tr 
          JOIN train_destination td ON td.dest=tr.origin) 
       SELECT * from train_destination;
+----------+
| dest     |
+----------+
| MILAN    |
| TURIN    |
| VENICE   |
| GENOA    |
| ROME     |
| FLORENCE |
| NAPLES   |
+----------+

Basically starting from any city, you can go wherever you want in Italy, but there are different paths. So let’s run a query to find out all the possible paths, and the total length of each, starting from Milan and Naples.

mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 
          FROM train_route 
          WHERE origin='MILAN'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin 
           AND  NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths;
+-------------------------------------------------------+----------+--------------+
| cur_path                                              | cur_dest | tot_distance |
+-------------------------------------------------------+----------+--------------+
| MILAN                                                 | MILAN    |            0 |
| MILAN -> TURIN                                        | TURIN    |          150 |
| MILAN -> VENICE                                       | VENICE   |          250 |
| MILAN -> GENOA                                        | GENOA    |          200 |
| MILAN -> ROME                                         | ROME     |          600 |
| MILAN -> FLORENCE                                     | FLORENCE |          380 |
| MILAN -> TURIN -> GENOA                               | GENOA    |          310 |
| MILAN -> GENOA -> TURIN                               | TURIN    |          360 |
| MILAN -> GENOA -> ROME                                | ROME     |          700 |
| MILAN -> ROME -> FLORENCE                             | FLORENCE |          820 |
| MILAN -> ROME -> NAPLES                               | NAPLES   |          810 |
| MILAN -> FLORENCE -> VENICE                           | VENICE   |          930 |
| MILAN -> FLORENCE -> ROME                             | ROME     |          600 |
| MILAN -> TURIN -> GENOA -> ROME                       | ROME     |          810 |
| MILAN -> GENOA -> ROME -> FLORENCE                    | FLORENCE |          920 |
| MILAN -> GENOA -> ROME -> NAPLES                      | NAPLES   |          910 |
| MILAN -> ROME -> FLORENCE -> VENICE                   | VENICE   |         1370 |
| MILAN -> ROME -> NAPLES -> VENICE                     | VENICE   |         1610 |
| MILAN -> FLORENCE -> ROME -> NAPLES                   | NAPLES   |          810 |
| MILAN -> TURIN -> GENOA -> ROME -> FLORENCE           | FLORENCE |         1030 |
| MILAN -> TURIN -> GENOA -> ROME -> NAPLES             | NAPLES   |         1020 |
| MILAN -> GENOA -> ROME -> FLORENCE -> VENICE          | VENICE   |         1470 |
| MILAN -> GENOA -> ROME -> NAPLES -> VENICE            | VENICE   |         1710 |
| MILAN -> FLORENCE -> ROME -> NAPLES -> VENICE         | VENICE   |         1610 |
| MILAN -> TURIN -> GENOA -> ROME -> FLORENCE -> VENICE | VENICE   |         1580 |
| MILAN -> TURIN -> GENOA -> ROME -> NAPLES -> VENICE   | VENICE   |         1820 |
+-------------------------------------------------------+----------+--------------+


mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 
          FROM train_route 
          WHERE origin='NAPLES'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin 
            AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths;
+-----------------------------------------------------------------+----------+--------------+
| cur_path                                                        | cur_dest | tot_distance |
+-----------------------------------------------------------------+----------+--------------+
| NAPLES                                                          | NAPLES   |            0 |
| NAPLES -> VENICE                                                | VENICE   |          800 |
| NAPLES -> VENICE -> MILAN                                       | MILAN    |         1050 |
| NAPLES -> VENICE -> MILAN -> TURIN                              | TURIN    |         1200 |
| NAPLES -> VENICE -> MILAN -> GENOA                              | GENOA    |         1250 |
| NAPLES -> VENICE -> MILAN -> ROME                               | ROME     |         1650 |
| NAPLES -> VENICE -> MILAN -> FLORENCE                           | FLORENCE |         1430 |
| NAPLES -> VENICE -> MILAN -> TURIN -> GENOA                     | GENOA    |         1360 |
| NAPLES -> VENICE -> MILAN -> GENOA -> TURIN                     | TURIN    |         1410 |
| NAPLES -> VENICE -> MILAN -> GENOA -> ROME                      | ROME     |         1750 |
| NAPLES -> VENICE -> MILAN -> ROME -> FLORENCE                   | FLORENCE |         1870 |
| NAPLES -> VENICE -> MILAN -> FLORENCE -> ROME                   | ROME     |         1650 |
| NAPLES -> VENICE -> MILAN -> TURIN -> GENOA -> ROME             | ROME     |         1860 |
| NAPLES -> VENICE -> MILAN -> GENOA -> ROME -> FLORENCE          | FLORENCE |         1970 |
| NAPLES -> VENICE -> MILAN -> TURIN -> GENOA -> ROME -> FLORENCE | FLORENCE |         2080 |
+-----------------------------------------------------------------+----------+--------------+

It’s quite easy now to find out which is the shortest path from one origin to any final destination. You just need to filter and order the main query. Here are some examples:

# shortest path from MILAN to NAPLES
mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='MILAN'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths 
       WHERE cur_dest='NAPLES' 
       ORDER BY tot_distance ASC LIMIT 1
+-------------------------+----------+--------------+
| cur_path                | cur_dest | tot_distance |
+-------------------------+----------+--------------+
| MILAN -> ROME -> NAPLES | NAPLES   |          810 |
+-------------------------+----------+--------------+

# shortest path from VENICE to GENOA
mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='VENICE'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths 
       WHERE cur_dest='GENOA' 
       ORDER BY tot_distance ASC LIMIT 1;
+--------------------------+----------+--------------+
| cur_path                 | cur_dest | tot_distance |
+--------------------------+----------+--------------+
| VENICE -> MILAN -> GENOA | GENOA    |          450 |
+--------------------------+----------+--------------+

# shortest path from VENICE to NAPLES
mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='VENICE'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths 
       WHERE cur_dest='NAPLES' 
       ORDER BY tot_distance ASC LIMIT 1;
+-----------------------------------+----------+--------------+
| cur_path                          | cur_dest | tot_distance |
+-----------------------------------+----------+--------------+
| VENICE -> MILAN -> ROME -> NAPLES | NAPLES   |         1060 |
+-----------------------------------+----------+--------------+

Limitations

Apart from the limitations we have already seen for limiting the execution time and the number of iterations, there are other built-in limitations you should be aware of.

The recursive SELECT must not contain the following constructs:

An aggregate function such as SUM()
GROUP BY
ORDER BY
DISTINCT
Window functions

These limitations are not valid for non-recursive CTE. Also, the recursive SELECT part must reference the CTE only once and only in its FROM clause, not in any subquery.

Conclusion

Recursive common table expression is a new interesting feature to implement queries for your applications using MySQL 8.0. Recursion was already possible in the past by creating stored routines but now it’s simpler. Furthermore, you don’t need special and additional grants to create a recursive query.

Generally, recursive CTE is quite simple, but compared to non-recursive CTE, it is a little more complicated. Recursive CTE is more tricky because of recursion, obviously. It’s not a matter of syntax, of course, it’s only a matter of “thinking recursively”.

↧

How to Run Orchestrator on FreeBSD

February 17, 2020, 9:06 am

≫ Next: PMM Optimizations, Updated Percona Server for MongoDB, proxysql-admin Tool: Release Roundup 2/17/2020

≪ Previous: Introduction to MySQL 8.0 Recursive Common Table Expression (Part 2)

In this post, I am going to show you how to run Orchestrator on FreeBSD. The instructions have been tested in FreeBSD 11.3 but the general steps should apply to other versions as well.

At the time of this writing, Orchestrator doesn’t provide FreeBSD binaries, so we will need to compile it.

Preparing the Environment

The first step is to install the prerequisites. Let’s start by installing git:

[vagrant@freebsd ~]$ sudo pkg update
Updating FreeBSD repository catalogue...
Fetching meta.txz: 100% 944 B 0.9kB/s 00:01
Fetching packagesite.txz: 100% 6 MiB 492.3kB/s 00:13
Processing entries: 100%
FreeBSD repository update completed. 31526 packages processed.
All repositories are up to date.

[vagrant@freebsd ~]$ sudo pkg install git
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
New version of pkg detected; it needs to be installed first.
The following 1 package(s) will be affected (of 0 checked):

Installed packages to be UPGRADED:
pkg: 1.12.0 -> 1.12.0_1

Number of packages to be upgraded: 1

3 MiB to be downloaded.

Proceed with this action? [y/N]: y
[1/1] Fetching pkg-1.12.0_1.txz: 100%    3 MiB 324.6kB/s    00:11
Checking integrity... done (0 conflicting)
[1/1] Upgrading pkg from 1.12.0 to 1.12.0_1...
[1/1] Extracting pkg-1.12.0_1: 100%
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 30 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
git: 2.25.0
p5-CGI: 4.45
p5-HTML-Parser: 3.72
p5-HTML-Tagset: 3.20_1
perl5: 5.30.1
expat: 2.2.8
p5-IO-Socket-SSL: 2.066
p5-Mozilla-CA: 20180117
p5-Net-SSLeay: 1.88
p5-IO-Socket-INET6: 2.72_1
p5-Socket6: 0.29
p5-Authen-SASL: 2.16_1
p5-GSSAPI: 0.28_1
p5-Digest-HMAC: 1.03_1
python37: 3.7.6
readline: 8.0.1
libffi: 3.2.1_3
p5-Error: 0.17029
pcre: 8.43_2
p5-subversion: 1.13.0
utf8proc: 2.4.0
subversion: 1.13.0
serf: 1.3.9_4
apr: 1.7.0.1.6.1
gdbm: 1.18.1_1
db5: 5.3.28_7
sqlite3: 3.30.1,1
liblz4: 1.9.2_1,1
p5-Term-ReadKey: 2.38_1
cvsps: 2.1_2

Number of packages to be installed: 30

The process will require 297 MiB more space.
57 MiB to be downloaded.

Proceed with this action? [y/N]: y
[1/30] Fetching git-2.25.0.txz: 100%    6 MiB 199.8kB/s    00:29
[2/30] Fetching p5-CGI-4.45.txz: 100%  154 KiB 158.1kB/s    00:01
[3/30] Fetching p5-HTML-Parser-3.72.txz: 100%   80 KiB  81.8kB/s    00:01
[4/30] Fetching p5-HTML-Tagset-3.20_1.txz: 100%   12 KiB  12.0kB/s    00:01
[5/30] Fetching perl5-5.30.1.txz: 100%   14 MiB 242.0kB/s    01:02
[6/30] Fetching expat-2.2.8.txz: 100%  124 KiB 127.2kB/s    00:01
...
[30/30] Fetching cvsps-2.1_2.txz: 100%   43 KiB  44.1kB/s    00:01
Checking integrity... done (0 conflicting)
[1/30] Installing readline-8.0.1...
[1/30] Extracting readline-8.0.1: 100%
[2/30] Installing expat-2.2.8...
[2/30] Extracting expat-2.2.8: 100%
[3/30] Installing gdbm-1.18.1_1...
[3/30] Extracting gdbm-1.18.1_1: 100%
[4/30] Installing db5-5.3.28_7...
[4/30] Extracting db5-5.3.28_7: 100%
[5/30] Installing perl5-5.30.1...
[5/30] Extracting perl5-5.30.1: 100%
...
[30/30] Installing git-2.25.0...
===> Creating groups.
Creating group 'git_daemon' with gid '964'.
===> Creating users
Creating user 'git_daemon' with uid '964'.
[30/30] Extracting git-2.25.0: 100%
=====
Message from perl5-5.30.1:

--
The /usr/bin/perl symlink has been removed starting with Perl 5.20.
For shebangs, you should either use:

#!/usr/local/bin/perl

or

#!/usr/bin/env perl

The first one will only work if you have a /usr/local/bin/perl,
the second will work as long as perl is in PATH.
=====
Message from apr-1.7.0.1.6.1:

--
The Apache Portable Runtime project removed support for FreeTDS with
version 1.6. Users requiring MS-SQL connectivity must migrate
configurations to use the added ODBC driver and FreeTDS' ODBC features.
=====
Message from python37-3.7.6:

--
Note that some standard Python modules are provided as separate ports
as they require additional dependencies. They are available as:

py37-gdbm       databases/py-gdbm@py37
py37-sqlite3    databases/py-sqlite3@py37
py37-tkinter    x11-toolkits/py-tkinter@py37
=====
Message from git-2.25.0:

--
If you installed the GITWEB option please follow these instructions:

In the directory /usr/local/share/examples/git/gitweb you can find all files to
make gitweb work as a public repository on the web.

All you have to do to make gitweb work is:
1) Please be sure you're able to execute CGI scripts in
   /usr/local/share/examples/git/gitweb.
2) Set the GITWEB_CONFIG variable in your webserver's config to
   /usr/local/etc/git/gitweb.conf. This variable is passed to gitweb.cgi.
3) Restart server.


If you installed the CONTRIB option please note that the scripts are
installed in /usr/local/share/git-core/contrib. Some of them require
other ports to be installed (perl, python, etc), which you may need to
install manually.

We also need the go and rsync packages in order to compile:

[vagrant@freebsd ~/orchestrator]$ sudo pkg install go
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 1 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
go: 1.13.7,1

Number of packages to be installed: 1

The process will require 266 MiB more space.
75 MiB to be downloaded.

Proceed with this action? [y/N]: y
[1/1] Fetching go-1.13.7,1.txz: 100%   75 MiB 245.2kB/s    05:22
Checking integrity... done (0 conflicting)
[1/1] Installing go-1.13.7,1...
[1/1] Extracting go-1.13.7,1: 100%

[vagrant@freebsd ~/orchestrator]$ sudo pkg install rsync
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 2 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
rsync: 3.1.3_1
libiconv: 1.14_11

Number of packages to be installed: 2

The process will require 3 MiB more space.
916 KiB to be downloaded.

Proceed with this action? [y/N]: y
[1/2] Fetching rsync-3.1.3_1.txz: 100%  308 KiB 315.9kB/s    00:01
[2/2] Fetching libiconv-1.14_11.txz: 100%  608 KiB 311.1kB/s    00:02
Checking integrity... done (0 conflicting)
[1/2] Installing libiconv-1.14_11...
[1/2] Extracting libiconv-1.14_11: 100%
[2/2] Installing rsync-3.1.3_1...
[2/2] Extracting rsync-3.1.3_1: 100%

Compiling Orchestrator

Now it is time to clone the repository from GitHub:

[vagrant@freebsd ~]$ git clone https://github.com/github/orchestrator.git
Cloning into 'orchestrator'...
remote: Enumerating objects: 56, done.
remote: Counting objects: 100% (56/56), done.
remote: Compressing objects: 100% (44/44), done.
remote: Total 28442 (delta 17), reused 32 (delta 10), pack-reused 28386
Receiving objects: 100% (28442/28442), 15.52 MiB | 3.60 MiB/s, done.
Resolving deltas: 100% (18006/18006), done.

The next step is to compile Orchestrator. According to the manual, we need to run script/build as follows:

[vagrant@freebsd ~]$ cd orchestrator/
[vagrant@freebsd ~/orchestrator]$ script/build
go version go1.13.7 freebsd/amd64 found in : Go Binary: /usr/local/bin/go
++ rm -rf .gopath
++ mkdir -p .gopath/src/github.com/github
++ ln -s /home/vagrant/orchestrator .gopath/src/github.com/github/orchestrator
++ export GOPATH=/home/vagrant/orchestrator/.gopath:/home/vagrant/orchestrator/.vendor
++ GOPATH=/home/vagrant/orchestrator/.gopath:/home/vagrant/orchestrator/.vendor
+ mkdir -p bin
+ bindir=/home/vagrant/orchestrator/bin
+ scriptdir=/home/vagrant/orchestrator/script
++ git rev-parse HEAD
+ version=548265494b3107ca2581d6ccee059e062a759b77
++ git describe --tags --always --dirty
+ describe=v3.1.4-2-g54826549
+ export GOPATH=/home/vagrant/orchestrator/.gopath
+ GOPATH=/home/vagrant/orchestrator/.gopath
+ cd .gopath/src/github.com/github/orchestrator
+ go build -i -o /home/vagrant/orchestrator/bin/orchestrator -ldflags '-X main.AppVersion=548265494b3107ca2581d6ccee059e062a759b77 -X main.BuildDescribe=v3.1.4-2-g54826549' ./go/cmd/orchestrator/main.go
+ rsync -qa ./resources /home/vagrant/orchestrator/bin/

Installation

Now, we have to move the compiled binary (and the additional files) to the final destination directory. In this case, I chose to use /usr/local/orchestrator.

[vagrant@freebsd ~/orchestrator]$ sudo mkdir -p /usr/local/orchestrator
[vagrant@freebsd ~/orchestrator]$ mv /home/vagrant/orchestrator/bin/orchestrator /usr/local/orchestrator/
[vagrant@freebsd ~/orchestrator]$ mv /home/vagrant/orchestrator/bin/resources /usr/local/orchestrator/

We also need to create both an init script and operating system user for Orchestrator, for example:

[vagrant@freebsd ~/orchestrator]$ sudo pw useradd orchestrator -s /usr/sbin/nologin
[vagrant@freebsd ~/orchestrator]$ sudo vi /usr/local/etc/rc.d/orchestrator

#!/bin/sh

# PROVIDE: orchestrator
# REQUIRE: LOGIN
# KEYWORD: shutdown

#
# Add the following line to /etc/rc.conf to enable orchestrator:
# orchestrator_enable (bool):  Set to "NO" by default.
#                       Set it to "YES" to enable MySQL.
# orchestrator_dir (str):  Default to "/usr/local/orchestrator"
#                       Base configuration directory.
# orchestrator_pidfile (str):  Custom PID file path and name.
#                       Default to "${orchestrator_dbdir}/${hostname}.pid".
# orchestrator_args (str):     Custom additional arguments to be passed
#                       to orchestrator (default --verbose http).
#

. /etc/rc.subr

name="orchestrator"
rcvar=orchestrator_enable

load_rc_config $name

: ${orchestrator_enable="NO"}
: ${orchestrator_dir="/usr/local/orchestrator"}
: ${orchestrator_args="--verbose http"}

orchestrator_user="orchestrator"
: ${hostname:=`/bin/hostname`}
pidfile=${orchestrator_pidfile:-"${orchestrator_dir}/${hostname}.pid"}
procname="/usr/local/orchestrator/orchestrator"
command="/usr/sbin/daemon"
command_args="-p ${pidfile} ${orchestrator_dir}/orchestrator ${orchestrator_args} >> /var/log/${name}.log 2>&1"

run_rc_command "$1"

We shouldn’t forget to set the proper permissions for the init script and the orchestrator directory:

[vagrant@freebsd ~/orchestrator]$ sudo chmod +x /usr/local/etc/rc.d/orchestrator 
[vagrant@freebsd ~/orchestrator]$ sudo chown -R orchestrator: /usr/local/orchestrator

Now let’s create a sample configuration file using the template available in GitHub:

[vagrant@freebsd ~/orchestrator]$ sudo vi /etc/orchestrator.conf.json

{
"Debug": true,
"EnableSyslog": false,
"ListenAddress": ":3000",
"MySQLTopologyUser": "orc_client_user",
"MySQLTopologyPassword": "orc_client_password",
"MySQLTopologyCredentialsConfigFile": "",
"MySQLTopologySSLPrivateKeyFile": "",
"MySQLTopologySSLCertFile": "",
"MySQLTopologySSLCAFile": "",
"MySQLTopologySSLSkipVerify": true,
"MySQLTopologyUseMutualTLS": false,
"BackendDB": "sqlite",
"SQLite3DataFile": "/usr/local/orchestrator/orchestrator.sqlite3",
"DefaultInstancePort": 3306,
"DiscoverByShowSlaveHosts": true,
"InstancePollSeconds": 5,
"DiscoveryIgnoreReplicaHostnameFilters": [
"a_host_i_want_to_ignore[.]example[.]com",
".*[.]ignore_all_hosts_from_this_domain[.]example[.]com"
],
"UnseenInstanceForgetHours": 240,
"SnapshotTopologiesIntervalHours": 0,
"InstanceBulkOperationsWaitTimeoutSeconds": 10,
"HostnameResolveMethod": "default",
"MySQLHostnameResolveMethod": "@@hostname",
"SkipBinlogServerUnresolveCheck": true,
"ExpiryHostnameResolvesMinutes": 60,
"RejectHostnameResolvePattern": "",
"ReasonableReplicationLagSeconds": 10,
"ProblemIgnoreHostnameFilters": [],
"VerifyReplicationFilters": false,
"ReasonableMaintenanceReplicationLagSeconds": 20,
"CandidateInstanceExpireMinutes": 60,
"AuditLogFile": "",
"AuditToSyslog": false,
"RemoveTextFromHostnameDisplay": ".mydomain.com:3306",
"ReadOnly": false,
"AuthenticationMethod": "",
"HTTPAuthUser": "",
"HTTPAuthPassword": "",
"AuthUserHeader": "",
"PowerAuthUsers": [
"*"
],
"ClusterNameToAlias": {
"127.0.0.1": "test suite"
},
"SlaveLagQuery": "",
"DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",
"DetectClusterDomainQuery": "",
"DetectInstanceAliasQuery": "",
"DetectPromotionRuleQuery": "",
"DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",
"PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",
"PromotionIgnoreHostnameFilters": [],
"DetectSemiSyncEnforcedQuery": "",
"ServeAgentsHttp": false,
"AgentsServerPort": ":3001",
"AgentsUseSSL": false,
"AgentsUseMutualTLS": false,
"AgentSSLSkipVerify": false,
"AgentSSLPrivateKeyFile": "",
"AgentSSLCertFile": "",
"AgentSSLCAFile": "",
"AgentSSLValidOUs": [],
"UseSSL": false,
"UseMutualTLS": false,
"SSLSkipVerify": false,
"SSLPrivateKeyFile": "",
"SSLCertFile": "",
"SSLCAFile": "",
"SSLValidOUs": [],
"URLPrefix": "",
"StatusEndpoint": "/api/status",
"StatusSimpleHealth": true,
"StatusOUVerify": false,
"AgentPollMinutes": 60,
"UnseenAgentForgetHours": 6,
"StaleSeedFailMinutes": 60,
"SeedAcceptableBytesDiff": 8192,
"PseudoGTIDPattern": "",
"PseudoGTIDPatternIsFixedSubstring": false,
"PseudoGTIDMonotonicHint": "asc:",
"DetectPseudoGTIDQuery": "",
"BinlogEventsChunkSize": 10000,
"SkipBinlogEventsContaining": [],
"ReduceReplicationAnalysisCount": true,
"FailureDetectionPeriodBlockMinutes": 60,
"RecoveryPeriodBlockSeconds": 3600,
"RecoveryIgnoreHostnameFilters": [],
"RecoverMasterClusterFilters": [
"_master_pattern_"
],
"RecoverIntermediateMasterClusterFilters": [
"_intermediate_master_pattern_"
],
"OnFailureDetectionProcesses": [
"echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
],
"PreFailoverProcesses": [
"echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
],
"PostFailoverProcesses": [
"echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
],
"PostUnsuccessfulFailoverProcesses": [],
"PostMasterFailoverProcesses": [
"echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
],
"PostIntermediateMasterFailoverProcesses": [
"echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
],
"CoMasterRecoveryMustPromoteOtherCoMaster": true,
"DetachLostSlavesAfterMasterFailover": true,
"ApplyMySQLPromotionAfterMasterFailover": true,
"PreventCrossDataCenterMasterFailover": false,
"MasterFailoverDetachSlaveMasterHost": false,
"MasterFailoverLostInstancesDowntimeMinutes": 0,
"PostponeSlaveRecoveryOnLagMinutes": 0,
"OSCIgnoreHostnameFilters": [],
"GraphiteAddr": "",
"GraphitePath": "",
"GraphiteConvertHostnameDotsToUnderscores": true
}

Finally, let’s configure Orchestrator to start at boot time, and start the service:

[vagrant@freebsd ~/orchestrator]$ sudo echo 'orchestrator_enable="YES"' >> /etc/rc.conf
[vagrant@freebsd ~/orchestrator]$ sudo service orchestrator start
Starting orchestrator.

Happy Orchestrating!

↧

PMM Optimizations, Updated Percona Server for MongoDB, proxysql-admin Tool: Release Roundup 2/17/2020

February 17, 2020, 11:14 am

≫ Next: Configuring ProxySQL Binlog Reader

≪ Previous: How to Run Orchestrator on FreeBSD

It’s release roundup time here at Percona!

Our Release Roundups showcase the latest software updates, tools, and features to help you manage and deploy our software, with highlights and critical information, as well as links to the full release notes and direct links to the software or service itself.

Today’s post includes those which have come out since February 3, 2020, including the optimization of Query Analytics parser code for PostgreSQL queries in Percona Monitoring and Management, bug fixes for Percona Server for MySQL, and updates for Percona Server for MongoDB.

Percona Monitoring and Management 2.2.2

On February 4, 2020, we released Percona Monitoring and Management 2.2.2. It is a free and open-source platform for managing and monitoring MySQL, MongoDB, and PostgreSQL performance. Improvements include a new --skip-server flag which makes it operate in a local-only mode as well as several bug fixes, including a correction for the Scraping Time Drift graph on the Prometheus dashboard, where it was showing the wrong values because the actual metrics resolution wasn’t taken into account.

Download Percona Monitoring and Management 2.2.2

Percona Server for MySQL 5.7.29-32

Percona Server for MySQL 5.7.29-32 was released on February 5, 2020. It is a free, fully compatible, enhanced and open source drop-in replacement for any MySQL database. This version includes several bug fixes, including PS-1469 where the Memory storage engine detected an incorrect “is full” condition, PS-5813 when setting the slow_query_log_use_global_control to “none” could cause an error, and PS-6123, when a Debian/Ubuntu init script used an incorrect comparison which could cause the service command to return before the server start.

Download Percona Server for MySQL 5.7.29-32

Percona Server for MongoDB 3.4.24-3.0

On February 6, 2020, Percona Server for MongoDB 3.4.24-3.0 was released. It is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.4 Community Edition while extending functionality by including the Percona Memory Engine and MongoRocks storage engines, as well as several enterprise-grade features such as External Authentication, Hot Backups, and Log Redaction. This release is based on MongoDB 3.4.24, and there are no additional improvements or new features on top of the changes in the upstream version.

Download Percona Server for MongoDB 3.4.24-3.0

ProxySQL 1.4.16 and Updated proxysql-admin Tool

On February 11, 2020, we announced that ProxySQL 1.4.16, released by ProxySQL, is available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool. ProxySQL is a high-performance proxy, currently for MySQL and database servers in the MySQL ecosystem (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. In this version, bug PSQLADM-219 was fixed, where the scheduler was handling the pxc_maint_mode variable incorrectly and open connections were closed immediately. Now the scheduler only sets the node status to OFFLINE_SOFT.

Percona Server for MongoDB 3.6.17-4.0

February 13, 2020, saw the release of Percona Server for MongoDB 3.6.17-4.0. Based on MongoDB 3.6.17, this release fixed bug PSMDB-473 where thelogApplicationMessage command failed even when it was run by the user with extended privileges. The problem has been fixed to allow running the logApplicationMessage command by any role that has the applicationMessage privilege, such as clusterManager or hostManager.

Download Percona Server for MongoDB 3.6.17-4.0

Percona Server for MongoDB 4.0.16-9

On February 17, 2020, we released Percona Server for MongoDB 4.0.16-9. With the Percona Memory Engine in-memory storage engine, HashiCorp Vault integration, Data-at-rest Encryption, audit logging, External LDAP Authentication with SASL, and hot backups, it’s a complete package that maximizes performance and streamlines database efficiencies. This release is based on MongoDB 4.0.16 and does not include any additional changes.

Download Percona Server for MongoDB 4.0.16-9

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training and software for MySQL, MariaDB, MongoDB, PostgreSQL, and other open source databases in on-premises and cloud environments.

We understand that choosing open source software for your business can be a potential minefield. You need to select the best available options, which fully support and adapt to your changing needs. Choosing the right open source software can allow you access to enterprise-level features, without the associated costs.

In our white paper, we discuss the key features that make open source software attractive, and why Percona’s software might be the best option for your business.

Download: When is Percona Software the Right Choice?

↧

Configuring ProxySQL Binlog Reader

February 18, 2020, 9:26 am

≫ Next: Webinar 2/26: Building a Kubernetes Operator for Percona XtraDB Cluster

≪ Previous: PMM Optimizations, Updated Percona Server for MongoDB, proxysql-admin Tool: Release Roundup 2/17/2020

Configuring ProxySQL Binlog Reader In a previous post, MySQL High Availability: Stale Reads and How to Fix Them, I’ve talked about the challenges of scaling out reads, where some types of applications cannot tolerate reading stale data. One of the ways of fixing it is by using ProxySQL Binlog Reader.

Long story short, binlog reader is a lightweight binary that keeps reading binlogs and informing ProxySQL – in real-time – about what has been applied on the said server. Since MySQL 5.7, as part of the OK_PACKET, the server will also send back information about the generated GTID event to clients. Knowing which GTID each server has applied and what was the last generated GTID the client connection received from the OK_PACKET, ProxySQL can route the following-up reads to a server that has already applied the said GTID.

At the time of this writing, you will need to compile binlog reader yourself.

Compile Binlog Reader:

To compile it on Centos7 you will need a few packets and repos pre-installed:

yum install -y make cmake gcc gcc-c++ epel-release https://dev.mysql.com/get/mysql80-community-release-el7-2.noarch.rpm git wget zlib-devel openssl-devel
yum -y --disablerepo=mysql80-community --enablerepo=mysql57-community install mysql-community-libs-compat-5.7.27-1.el7.x86_64 boost-devel.x86_64 mysql-community-devel-5.7.27-1.el7.x86_64 mysql-community-common-5.7.27-1.el7.x86_64 mysql-community-libs-5.7.27-1.el7.x86_64

Add the hash header file into include directory:

cd /usr/include/mysql 
MYSQL_VERSION=$(rpm -qa | grep mysql-community-devel | awk -F'-' '{print $4}') 
wget https://raw.githubusercontent.com/mysql/mysql-server/mysql-${MYSQL_VERSION}/include/hash.h cd

Compile libslave and ProxySQL Binlog Reader:

cd
git clone https://github.com/sysown/proxysql_mysqlbinlog.git 
cd proxysql_mysqlbinlog/libslave/ 
cmake . && make 
cd .. 
ln -s /usr/lib64/mysql/libmysqlclient.a /usr/lib64/libmysqlclient.a 
make
chmod +x proxysql_binlog_reader

Running Binlog Reader:

In order to run binlog reader, your mysql server must have boost installed:

yum install epel-release -y
yum install boost-system -y

GTID must be enabled and session_track_gtids configured to OWN_GTID:

gtid_mode=ON
enforce_gtid_consistency
session_track_gtids=OWN_GTID

To start the binary, you need to specify a port for it to bind to. This port will be later configured on ProxySQL:

./proxysql_binlog_reader -h 127.0.0.1 -u root -psekret -P 3306 -l 3307 -L /tmp/binlogreader.log

At this moment, if you inspect the error log, you should see something like:

Starting ProxySQL MySQL Binlog
Sucessfully started
Angel process started ProxySQL MySQL Binlog process 795
2020-02-07 16:15:32 [INFO] Initializing client...
655dfbcb-49c2-11ea-a325-00163ecf1f9c:1-1
2020-02-07 16:15:32 [INFO] Reading binlogs...

Please note, if you find the below error, it means you haven’t executed any event that generated a GTID on the server (#Issue7):

Starting ProxySQL MySQL Binlog
Sucessfully started
Angel process started ProxySQL MySQL Binlog process 898
2020-02-07 17:41:34 [INFO] Initializing client...
Error in initializing slave: basic_string::erase
2020-02-07 17:41:34 [INFO] Exiting...
Shutdown angel process

Configuring ProxySQL:

Now it’s time to inform proxysql that:

The servers have proxysql binlog reader running on port X. Assuming you already have your mysql_servers table populated, all you will have to do is update each entry to inform the GTID port:
```
UPDATE mysql_servers SET gtid_port = 3307;
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
```
Please note, if none of the slaves has received the GTID you are requesting, you want the master to serve this request. For this reason, it is important that your master is part of your read HG (it can be with the lowest possible weight). You can achieve that by setting mysql-monitor_writer_is_also_reader to true (default value).

A particular query rule should enforce GTID from the writer HG. Most of the cases, this will be the rule that matches the SELECT queries:

mysql> SELECT rule_id, match_digest, destination_hostgroup, gtid_from_hostgroup FROM mysql_query_rules;
+---------+---------------------+-----------------------+---------------------+
| rule_id | match_digest        | destination_hostgroup | gtid_from_hostgroup |
+---------+---------------------+-----------------------+---------------------+
| 200     | ^SELECT.*FOR UPDATE | 10                    | NULL                |
| 201     | ^SELECT             | 11                    | NULL                |
+---------+---------------------+-----------------------+---------------------+
2 rows in set (0.00 sec)

mysql> UPDATE mysql_query_rules SET gtid_from_hostgroup = 10 WHERE rule_id = 201;
Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.25 sec)

mysql> SELECT rule_id, match_digest, destination_hostgroup, gtid_from_hostgroup FROM mysql_query_rules;
+---------+---------------------+-----------------------+---------------------+
| rule_id | match_digest        | destination_hostgroup | gtid_from_hostgroup |
+---------+---------------------+-----------------------+---------------------+
| 200     | ^SELECT.*FOR UPDATE | 10                    | NULL                |
| 201     | ^SELECT             | 11                    | 10                  |
+---------+---------------------+-----------------------+---------------------+
2 rows in set (0.00 sec)

Network Traffic:

One of the advantages of this feature is that you can serve proxysql server with minimal impact on your network since proxysql binlog read will read only a portion of each binlog event to extract the GTID and only it will be sent over the network. This is unlike a normal slave, where all of the binlog events is sent. A GTID event will be classified into three categories and the size will vary depending on each state (below names are not official):

Full

marcelo-altmann-PU-replication-1.lxd.opsession-prxy > marcelo-altmann-PU-proxysql-1.lxd.41726: Flags [P.], cksum 0x5dfd (incorrect -> 0xc8cb), seq 1:49, ack 1, win 510, options [nop,nop,TS val 792525047 ecr 430181741], length 48
	0x0000:  4500 0064 61e3 4000 4006 7b0a ac10 025f  E..da.@.@.{...._
	0x0010:  ac10 0327 0ceb a2fe c48c 6725 bf6c 4c56  ...'......g%.lLV
	0x0020:  8018 01fe 5dfd 0000 0101 080a 2f3c f8f7  ....]......./<..
	0x0030:  19a4 0d6d 5354 3d64 6131 6262 3030 302d  ...mST=da1bb000-
	0x0040:  3563 3963 2d31 3165 392d 3965 6537 2d30  5c9c-11e9-9ee7-0
	0x0050:  3031 3633 6566 6162 6163 303a 312d 3535  0163efabac0:1-55
	0x0060:  3239 390a                                299.

When ProxySQL detects this, it is a completely new GTID (for a new server_uuid). In the above example, we are sending 48 bytes corresponding to the full GTID set of da1bb000-5c9c-11e9-9ee7-00163efabac0:1-55299. This is normally sent when proxysql does not have any track of GTIDs for this server.

Delta

01:30:26.066393 IP (tos 0x0, ttl 64, id 25060, offset 0, flags [DF], proto TCP (6), length 94)
    marcelo-altmann-PU-replication-1.lxd.opsession-prxy > marcelo-altmann-PU-proxysql-1.lxd.41726: Flags [P.], cksum 0x5df7 (incorrect -> 0x93b9), seq 49:91, ack 1, win 510, options [nop,nop,TS val 792557032 ecr 430181741], length 42
	0x0000:  4500 005e 61e4 4000 4006 7b0f ac10 025f  E..^a.@.@.{...._
	0x0010:  ac10 0327 0ceb a2fe c48c 6755 bf6c 4c56  ...'......gU.lLV
	0x0020:  8018 01fe 5df7 0000 0101 080a 2f3d 75e8  ....]......./=u.
	0x0030:  19a4 0d6d 4931 3d64 6131 6262 3030 3035  ...mI1=da1bb0005
	0x0040:  6339 6331 3165 3939 6565 3730 3031 3633  c9c11e99ee700163
	0x0050:  6566 6162 6163 303a 3535 3330 300a       efabac0:55300.

When ProxySQL has already sent a full GTID event, it will send only the delta event, compounded by the full server_uuid + incremental id. In the above example, we are sending 42 bytes (instead of the initial 48) and the GTID event is da1bb000-5c9c-11e9-9ee7-00163efabac0:1-55300.

Incremental

01:30:44.388537 IP (tos 0x0, ttl 64, id 25061, offset 0, flags [DF], proto TCP (6), length 61)
    marcelo-altmann-PU-replication-1.lxd.opsession-prxy > marcelo-altmann-PU-proxysql-1.lxd.41726: Flags [P.], cksum 0x5dd6 (incorrect -> 0xa8c8), seq 91:100, ack 1, win 510, options [nop,nop,TS val 792575354 ecr 430213726], length 9
	0x0000:  4500 003d 61e5 4000 4006 7b2f ac10 025f  E..=a.@.@.{/..._
	0x0010:  ac10 0327 0ceb a2fe c48c 677f bf6c 4c56  ...'......g..lLV
	0x0020:  8018 01fe 5dd6 0000 0101 080a 2f3d bd7a  ....]......./=.z
	0x0030:  19a4 8a5e 4932 3d35 3533 3031 0a         ...^I2=55301.

All events after the Delta will contain only the incremental part of the GTID (it will omit the server_uuid). On the above example, we are sending only 9 bytes and sending the GTID 55301 (complete GTID is da1bb000-5c9c-11e9-9ee7-00163efabac0:1-55301).

Testing:

To simulate the issue, we will use a simple PHP script:

<?php
date_default_timezone_set('UTC');
$mysqli = new mysqli('127.0.0.1', 'root', 'sekret', 'test', 6033);
if ($mysqli->connect_error) {
    die('Connect Error (' . $mysqli->connect_errno . ') '
            . $mysqli->connect_error);
}

/* Setup */
echo date('Y-m-d H:i:s') . " Starting to SETUP the test\n";
$mysqli->query("DROP TABLE IF EXISTS joinit");
$mysqli->query("CREATE TABLE IF NOT EXISTS `test`.`joinit` (
  `i` bigint(11) NOT NULL AUTO_INCREMENT,
  `s` char(255) DEFAULT NULL,
  `t` datetime NOT NULL,
  `g` bigint(11) NOT NULL,
  KEY(`i`, `t`),
  PRIMARY KEY(`i`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;");
$date1=date('Y-m-d H:i:s');
$mysqli->query("INSERT INTO test.joinit VALUES (NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )));");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");
$mysqli->query("INSERT INTO test.joinit SELECT NULL, uuid(), time('$date1'),  (FLOOR( 1 + RAND( ) *60 )) FROM test.joinit;");

echo date('Y-m-d H:i:s') . " Starting to RUN the test\n";

$result = $mysqli->query("SELECT MAX(i) FROM joinit");
$row = $result->fetch_row();
sleep(2);
$date2=date('Y-m-d H:i:s');

for ($i=1; $i<$row[0]; $i++)
{
  $result = $mysqli->query("SELECT i FROM joinit WHERE i = $i");
  if($result->num_rows == 0)
    continue;
  
  $mysqli->query("UPDATE joinit SET t = '$date2' WHERE i = $i");
  
  $result = $mysqli->query("SELECT i FROM joinit WHERE t = '$date2' AND i = $i");
  if($result->num_rows == 0)
  {
      echo date('Y-m-d H:i:s') . " Dirty Read Detected on i $i . . .";
      usleep(500000);
      $result = $mysqli->query("SELECT i FROM joinit WHERE t = '$date2' AND i = $i");
      echo " After 500ms rows found $result->num_rows \n";
  } else {
    echo date('Y-m-d H:i:s') . " i $i is ok\n";
  }
}
$mysqli->close();
?>

The script will:

Populate a table with some random data.
For each row, it will update a datetime column with the current time.
Immediately after the update, it will try to query one of the slaves using the current time as a parameter. If the slave has not processed the update from step 2, it will return 0 rows. The script will report a dirty read and try the same select again after 0.5 seconds and report the number of returned rows.
If step 3 succeeded, it will report as ok.

Example run without GTID consistent reads:

2020-02-07 17:13:52 Starting to SETUP the test
2020-02-07 17:13:53 Starting to RUN the test
2020-02-07 17:13:55 i 1 is ok
2020-02-07 17:13:55 Dirty Read Detected on i 2 . . . After 500ms rows found 1
2020-02-07 17:13:55 i 3 is ok
2020-02-07 17:13:55 Dirty Read Detected on i 4 . . . After 500ms rows found 1
2020-02-07 17:13:56 Dirty Read Detected on i 6 . . . After 500ms rows found 1
2020-02-07 17:13:56 i 7 is ok
2020-02-07 17:13:56 i 8 is ok
2020-02-07 17:13:56 i 9 is ok
2020-02-07 17:13:56 i 13 is ok
2020-02-07 17:13:56 i 14 is ok
2020-02-07 17:13:56 Dirty Read Detected on i 15 . . . After 500ms rows found 1

Example run with GTID consistent reads:

2020-02-07 17:15:27 Starting to SETUP the test
2020-02-07 17:15:28 Starting to RUN the test
2020-02-07 17:15:30 i 1 is ok
2020-02-07 17:15:30 i 2 is ok
2020-02-07 17:15:30 i 3 is ok
2020-02-07 17:15:30 i 4 is ok
2020-02-07 17:15:30 i 6 is ok
2020-02-07 17:15:30 i 7 is ok
2020-02-07 17:15:30 i 8 is ok
2020-02-07 17:15:30 i 9 is ok
2020-02-07 17:15:30 i 13 is ok
2020-02-07 17:15:30 i 14 is ok
2020-02-07 17:15:30 i 15 is ok

Monitoring:

To monitor if a backend server is sending GTID event, you can query stats_mysql_gtid_executed table:

mysql> SELECT * FROM stats_mysql_gtid_executed;
+---------------+------+-----------------------------------------------+--------+
| hostname      | port | gtid_executed                                 | events |
+---------------+------+-----------------------------------------------+--------+
| 10.126.47.251 | 3306 | 278a1c44-51c8-11ea-a1ad-00163e1e8e27:1-344029 | 72091  |
| 10.126.47.170 | 3306 | 278a1c44-51c8-11ea-a1ad-00163e1e8e27:1-344029 | 72089  |
+---------------+------+-----------------------------------------------+--------+
2 rows in set (0.01 sec)

In order to verify if queries are been served using this feature, you can then query stats_mysql_connection_pool table and look for the Queries_GTID_sync column:

mysql> SELECT * FROM stats_mysql_connection_pool;
+-----------+--------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| hostgroup | srv_host     | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+--------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 11        | 10.126.47.79 | 3306     | ONLINE | 0        | 2        | 2      | 0       | 1           | 1327    | 1309              | 59965           | 7866            | 196        |
| 11        | 10.126.47.39 | 3306     | ONLINE | 0        | 2        | 2      | 0       | 1           | 18      | 0                 | 806             | 77              | 632        |
| 10        | 10.126.47.79 | 3306     | ONLINE | 0        | 1        | 2      | 0       | 1           | 602     | 10                | 39383           | 84              | 196        |
+-----------+--------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
3 rows in set (0.01 sec)

Summary:

There are a few caveats to consider if you want to adopt this feature:

In order for this feature to work, the read query must be executed in the same connection as the write query. Or, if you are using proxysql version 2.0.9 onwards, you can send a query comment specifying the minimum GTID which a server must have in order to serve the read.
Some write queries don’t generate GTID. For example, an UPDATE that returns 0 affected rows in this case (either due to a condition not matching any row in the table or updated columns being equal to current values), proxysql will not use the feature and redirect the following reads to any slave not updating Queries_GTID_sync.
This feature doesn’t work with Galera replication.
If you don’t have your master as part of your read hostgroup (that is done by default with proxysql variable mysql-monitor_writer_is_also_reader), ProxySQL will behave like it was executing SELECT WAIT_FOR_EXECUTED_GTID_SET, where the execution of reading query will stall until the slave has received the requested ID or proxysql mysql-connect_timeout_server_max has elapsed.

Moving from a centralized architecture into a distributed one can bring some challenges, and a delay in replication causing slaves to provide stale data is one of them. ProxySQL binlog readers can help mitigate this issue.

Please note: This feature and binary (proxysql_binlog_reader) are relatively new and are not considered GA as of now. We highly advise you to extensively test it before implementing it in production.

↧

Webinar 2/26: Building a Kubernetes Operator for Percona XtraDB Cluster

February 19, 2020, 6:37 am

≫ Next: MySQL Encryption: How Master Key Rotation Works

≪ Previous: Configuring ProxySQL Binlog Reader

Building a Kubernetes Operator for Percona XtraDB Cluster

This talk covers some of the challenges we sought to address by creating a Kubernetes Operator for Percona XtraDB Cluster, as well as a look into the current state of the Operator, a brief demonstration of its capabilities, and a preview of the roadmap for the remainder of the year. Find out how you can deploy a 3-node PXC cluster in under five minutes and handle providing self-service databases on the cloud in a cloud-vendor agnostic way. You’ll have the opportunity to ask the Product Manager questions and provide feedback on what challenges you’d like us to solve in the Kubernetes landscape.

Please join Percona Product Manager Tyler Duzan on Wednesday, February 26, 2020, at 1 pm EST for his webinar “Building a Kubernetes Operator for Percona XtraDB Cluster”.

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

↧

MySQL Encryption: How Master Key Rotation Works

February 19, 2020, 9:40 am

≫ Next: A Hidden Gem in MySQL: MyRocks

≪ Previous: Webinar 2/26: Building a Kubernetes Operator for Percona XtraDB Cluster

MySQL How Master Key Rotation Works In the last blog post of this series, we discussed in detail how Master Key encryption works. In this post, based on what we already know about Master Key encryption, we look into how Master Key rotation works.

The idea behind Master Key rotation is that we want to generate a new Master Key and use this new Master Key to re-encrypt the tablespace key (stored in tablespace’s header).

Let’s remind ourselves what a Master Key encryption header looks like (it is located in tablespace’s header):

From the previous blog post, we know that when a server starts it goes through all encrypted tablespaces’ encryption headers. During that, it remembers the highest KEY ID it read from all the encrypted tablespaces. For instance, if we have three tables with KEY_ID = 3 and one table with KEY ID = 4, it means that the highest key ID we found in the server is 4. Let’s call this highest KEY ID – MAX KEY ID.

How Master Key Rotation Works, Step by Step:

1. User issues ALTER INNODB MASTER KEY;

2. The server asks keyring to generate a new Master Key with server’s UUID and KEY_ID being MAX KEY ID incremented by one. So we get INNODB_KEY-UUID-(MAX_KEY_ID+1). On successful Master Key generation, the MAX KEY ID is incremented by one (i.e. MAX_KEY_ID = MAX_KEY_ID + 1).

3. The server goes through all the Master Key encrypted tablespaces in the server and for each tablespace:

– encrypts tablespace key with the new Master Key
– updates key id to the new MAX KEY ID
– if UUID is different than the server’s UUID it gets set to the server’s UUID

As we know, the Master Key ID used to decrypt table is built of UUID and KEY ID read from the tablespace’s header. What we are doing now is updating this information in the tablespace’s encryption header, so the server would retrieve the correct Master Key when trying to decrypt the tablespace.

If we happen to have tablespaces coming from different places – like, for instance, retrieved from different backups – those tablespaces may be using different Master Keys. All those Master Keys would need to be retrieved from keyring on server startup. This might make the server’s startup slow, especially if we are using server-based keyring. With Master Key rotation, we re-encrypt tablespace keys with one – the same for all tablespaces – Master Key. Now the server needs to retrieve only one Master Key from Key server (for server-based keyring) on startup.

This is, of course, only a nice side effect – the main purpose why we do Master Key rotation is to make our server more secure. In case Master Key was somehow stolen from the keyring (for instance, from Vault Server) we can generate a new Master Key and re-encrypt the tablespaces keys, making the stolen key no longer valid. We are safe … almost.

In the previous blog post, I explained that once a decrypted tablespace key is stolen, a third-party can keep using it to decrypt our data – given that they have access to our disk. In case Master Key was stolen, and if the third-party had access to our encrypted data, they could use the stolen Master Key to decrypt the tablespace key and thus be able to decrypt the data. As we can see, Master Key rotation will not help us in that case. We will re-encrypt the tablespace key with the new Master Key, but the actual tablespace key used to encrypt/decrypt tablespace will remain the same; so “a hacker” can keep using it to decrypt the data. I previously hinted that Percona Server for MySQL has a way of doing actual re-encryption of tablespaces instead of just re-encrypting tablespace key. The feature is called encryption threads, however, at this point in time, it is still an experimental feature.

A case where Master Key rotation is helpful is when Master Key is stolen, but the attacker did not have a chance to use it and decrypt our tablespace keys.

↧

A Hidden Gem in MySQL: MyRocks

February 20, 2020, 8:47 am

≫ Next: Percona Live Austin 2020 Sneak Peek!

≪ Previous: MySQL Encryption: How Master Key Rotation Works

using MyRocks in MySQL In this blog post, we will share some experiences with the hidden gem in MySQL called MyRocks, a storage engine for MySQL’s famous pluggable storage engine system. MyRocks is based on RocksDB which is a fork of LevelDB. In short, it’s another key-value store based on LSM-tree, thus granting it some distinctive features compared with other MySQL engines. It was introduced in 2016 by Facebook and later included, respectively, in Percona Server for MySQL and MariaDB.

Background and History

The original paper on LSM was published in 1996, and if you need a single takeaway, the following quote is the one: “The LSM-tree uses an algorithm that defers and batches index changes, cascading the changes from a memory-based component through one or more disk components in an efficient manner reminiscent of merge sort.” At the time, disks were slow and IOPS expensive, and the idea was to minimize the write costs by essentially turning random write load into a sequential one. The technology is quite popular, being a foundation or inspiration in a multitude of databases and storage engines: HBase, LevelDB, RocksDB, Tarantool, WiredTiger, and more. Even in 2020, when storage is faster and cheaper, LSM-tree can still provide substantial benefits for some workloads.

The development of MyRocks was started around 2015 by Facebook. Yoshinori Matsunobu gave multiple presentations, detailing the reasoning behind using RocksDB inside MySQL. They were underutilizing the servers because they were constrained in disk space and MyRocks allowed for better space efficiency. This better space efficiency is inherent for LSM tree storage engines.

So far, MyRocks continues to be a somewhat niche solution, and, frankly, not a lot of people know about it and consider its use. Without further ado, let’s see how it works and why would you want to use it.

Working Dynamics of MyRocks

MyRocks in MySQL

MyRocks engine is based on LSM-tree structure, which we have mentioned above. That makes it a very different beast than InnoDB. So let’s take a high-level overview of MyRocks internals. First, how does row-based data fit into key-value store? You can think of a regular clustered index as a key-value structure on its own: there’s a key, which value is a whole row. Secondary indexes can have primary indexes’ key as value, and additionally a column data value.

Writes

All writes in MyRocks are done sequentially to a special structure called memtable, one of the few mutable structures in the engine. Since we need the writes to actually be durable, all writes are also written to WAL (a concept similar to InnoDB redo log), which is flushed to disk. Once the memtable becomes full, it’s copied in memory and made immutable. In the background, the immutable memtables will be flushed to disk in the form of sorted string tables (SSTs), forming the L0 of the multi-leveled compaction scheme. During this initial flush, changes in the memtable are deduplicated (a thousand updates for one key become a single update). Resulting SSTs are immutable, and, on L0, have overlapping data.

As more SSTs are created on L0, they will start to pour over to L1…L6. On each level after L0, data within SSTs is not overlapping, thus compaction can proceed in parallel. Compaction takes an SST from the higher level, and merges it with one (or more) SSTs on the lower level, deleting the originals and creating new ones on the lower level. Eventually, data reaches the lowest level. As you can see below, each level has more and more data, so most data is actually stored at the lower levels. The merge mentioned happens for Key Value pairs, and during the merge KV on the lower level will always be older than KV on the higher one, and thus can be discarded.

LSM Leveled Compaction

Having immutable SSTs allows them to be filled to 100% all the time, improving space utilization. In fact, that’s one of the selling points of MyRocks, as it allows for greater space efficiency. In addition to the inherent compactness of the SSTs, data there is also compressed, which further minimizes the footprint. An interesting feature here is that you can specify different compression algorithms for the bottommost (where, by nature, most of the data is) and other levels.

Another important component for the MyRocks engine is Column Family (CF). Each key-value pair (or, in familiar terms, each index) is associated with a CF. Quoting the Percona Server for MySQL docs: “Each column family has distinct attributes, such as block size, compression, sort order, and MemTable.” In addition to controlling physical storage characteristics, this provides atomicity for queries across different key spaces.

MyRocks in MySQL

Reads

So far we’ve only been talking about writing the data. Reading it is also quite different in MyRocks due to its structure. Since the data is leveled, to find a value for a key, you need to look at memtables, L0, L1 … L6. This is an LSM read penalty. However, you don’t always have to scan the whole dataset to find the row, and not all scans go to disk. The read path starts in memtables, which will by definition have the most recent data. Then the block cache will be used, which might contain the recently-accessed data.

Once in-memory options are exhausted, reads will spill to disk and start traversing SSTs on consecutive levels. L0 has to be scanned whole since data in SSTs overlaps, but only a subset of SSTs on other levels has to be scanned, as we know key ranges of data inside each SST. To further improve this scanning, bloom filters are utilized, which helps the scan operation answer a question: “is key present in given SST?” – but only if we are sure it’s not present. Thus, we can avoid reading some SSTs, whose key range covers the key we look for. Unfortunately, for now, there’s no BF-like technique for range scans, though prefix bloom filters might help.

Each time we find the data we’re looking for, we populate the block cache for future use. In addition to that, index and bloom filter data is also cached, thus speeding up the SST scans even if the data is not in block cache. Even with all of these improvements, you can see that in general, the reads are more involved than they are in regular b-tree storage engines. The negative effects, however, become less pronounced the more data there’s in the data set.

Tools and Utilities

Production readiness of a solution is defined not only by its own maturity but also by the ecosystem around it. Let’s review how MyRocks fits with existing tools and regular maintenance activities.

First and foremost, can we back it up online with minimal locking as we can innodb? The answer is yes (with some catches). Original Facebook’s MySQL 5.6 includes myrocks_hotbackup script, which enables hot backups of MyRocks, but no other engines. Starting with Percona XtraBackup version 8.0.6 and Mariabackup 10.2.16/10.3.8, we have the ability to use a single tool to back up heterogeneous clusters.

One of the significant MyRocks limitations is that it doesn’t support online DDL as InnoDB does. You can use solutions like pt-online-schema-change and gh-ost, which are preferred anyway when doing large table changes. For pt-osc, there are some details to note. Global transaction isolation should be set to Read Committed, or pt-osc will fail when a target table is already in RocksDB engine. It also needs binlog_format to be set to ROW. Both of these settings are usually advisable for MyRocks anyway, as it doesn’t support gap locking yet, and so its repeatable read implementation differs.

Because we’re limited to ROW-level replication, tools like pt-table-checksum and pt-table-sync will not work, so be careful with the data consistency.

Monitoring is another important consideration for production use. MyRocks is quite well-instrumented internally, providing more than a hundred metrics, extensive show engine output, and verbose logging. Here’s an overview of some of the available metrics: MyRocks Information Schema. With Percona Monitoring and Management, you get a dedicated dashboard for MyRocks, providing an overview of the internals of the engine.

Partitioning in MyRocks is supported and has an interesting feature where you can assign partitions to different column families: Column Families on Partitioned Tables.

Unfortunately, for now, encryption does not work with MyRocks, even though RocksDB supports pluggable encryption.

Load Test and Comparison Versus InnoDB

We have compiled a basic load test on MyRocks vs InnoDB with the following details.

We downloaded Ontime Performance Data Reporting for the year 2019 and loaded it to both engines. The test consisted of loading to a single table data for one year worth of information (about 14million rows). Load scripts can be found at github repo.

AWS Instance : t2.large – 8Gb Ram – 16Gb SSD

Engine	Size	Duration	Rows	Method
innodb + log_bin off	5.6Gb	9m56	14,009,743	Load Infile
innodb + log_bin on	5.6Gb **	11m58	14,009,743	Load Infile
innodb compressed + log_bin on	2.6Gb **	17m9	14,009,743	Load Infile
innodb compressed + log_bin off	2.6Gb	15m56	14,009,743	Load Infile
myrocks/lz4 + log_bin on	1.4G*	9m24	14,009,743	Load Infile
myrocks/lz4 + log_bin off	1.4G*	8m2	14,009,743	Load Infile

* MyRocks WAL files aren’t included (This is a configurable parameter)

**InnoDB Redo logs aren’t included

Conclusion

As we’ve shown above, MyRocks can be a surprisingly versatile choice of the storage engine. While usually it’s sold on space efficiency and write load, benchmarks show that it’s quite good in TPC-C workload. So when would you use MyRocks?

In the simplest terms:

You have extremely large data sets, much bigger than the memory available
The bulk of your load is write-only
You need to save on space

This best translates to servers with expensive storage (SSDs), and to the cloud, where these could be significant price points.

But real databases rarely consist of pure log data. We do selects, be it point lookups or range queries, we modify the data. As it happens, if you can sacrifice some database-side constraints, MyRocks can be surprisingly good as a general-purpose storage engine, more so the larger the data set you have. Give it a try, and let us know.

Limitations to consider before moving forward:

Foreign Keys
Full-Text Keys
Spatial Keys
No Tablespaces (instead, Column Families)
No Online DDL (pt-osc and gh-ost help here)
Other limitations listed in the documentation
Not supported by Percona XtraDB Cluster/Galera
Only binary collations supported for indexes

Warnings:

It’s designed for small transactions, so configure for bulk operations. For loading data, use rocksdb_bulk_load=1, and for deleting large data sets use rocksdb-commit-in-the-middle.

Mixing different storage engines in one transaction will work, but be aware of the differences of how isolation levels work between InnoDB and RocksDB engines, and limitations like the lack of Savepoints. Another important thing to note when mixing storage engines is that they use different memory structures, so plan carefully.

Corrupted immutable files are not recoverable.

References

MyRocks Deep Dive

Exposing MyRocks Internals Via System Variables: Part 1, Data Writing

Webinar: How to Rock with MyRocks

MyRocks Troubleshooting

MyRocks Introduction

Optimizer Statistics in MyRocks

MyRocks and InnoDB: a summary

RocksDB Is Eating the Database World

↧

Percona Live Austin 2020 Sneak Peek!

February 20, 2020, 9:51 am

≫ Next: Percona Monitoring and Management, Meet Prometheus Alertmanger

≪ Previous: A Hidden Gem in MySQL: MyRocks

It’s happening! Percona Live is coming soon and our teams are hard at work finalizing the agenda. Join us for three days of collaboration, learning, and fun, with presentations and discussions on topics relevant to the development, implementation, and management of open source database solutions.

Percona Live 2020

With early bird ticket pricing available until March 1, there’s no better time to reserve your spot. We are excited to feature a new location this year, the AT&T Hotel and Conference Center in Austin, TX from May 18-20.

Why Should You Attend Percona Live?

See how companies build success in a multi-cloud, multi-database environment.
Learn how to enhance cloud-native applications in your database environment.
Share tips on how to manage systems at scale effectively.
Learn what it takes to keep systems up and running in an outage or slowdown.
Explore the latest data security practices that help prevent data leakage.
Learn best practices for enabling developers to self-support and database choices.

Percona Live 2020

A Sneak Peek:

This year, our agenda is packed with sessions from industry leaders. Here’s a preview of what’s to come:

Scaling Financial Audit Logs at TransferWise with MongoDB — Pedro Albuquerque, Transferwise
Experiences with JSON in MySQL 8.0 at Paypal — Yashada Jadhav and Stacy Yuan, Paypal
Backup and Recovery of Databases in Cloud Native Environments — Tom Manville, Kasten
PostgreSQL Tutorial For Oracle & MySQL DBAs or Beginners — Avinash Vallarapu, Percona
Reactive Distributed MicroServices with Apache Cassandra & Docker — Dinesh Joshi, Apache Cassandra
The Venerable Buffer Pool: What’s Its Role If Storage Is Fast? — Dave Cohen & Szymon Scharmac, Intel; Saif Alharthi, PlanetScale.
Our Journey to Better MySQL Availability Using Global Transaction IDs, ProxySQL and Consul — Stephane Combaudon and Yuriy Olshanetskiy, Rakuten

The full schedule featuring 11 tutorials and over 80 talks will be announced in just a couple of weeks!

New This Year!

Level 99 breakout session for ‘experts’ only, designed for the most advanced, hardcore attendees. Are you ready to level up?

Percona Live 2020

With every Percona Live, your ticket includes access to the expo hall featuring industry-leading software providers and community projects plus plenty of opportunities for networking, at events like the Welcome Reception, and more!

↧

Percona Monitoring and Management, Meet Prometheus Alertmanger

February 21, 2020, 11:48 am

≫ Next: MySQL ERROR 1034: Incorrect Key File on InnoDB Table

≪ Previous: Percona Live Austin 2020 Sneak Peek!

Percona Monitoring and Management and Prometheus Alertmanger

One of the requests we get most often on the Percona Monitoring and Management (PMM) team is “Do you support alerting?” The answer to that question has always been “Yes” but the feedback on how we offered it natively was that it was, well, not robust enough! We’ve been hard at work to change that and are excited to offer, starting with the newly released PMM version 2.3.0, a more dynamic alerting mechanism for your PMM installations: Integration with Prometheus Alertmanager.

Prometheus Alertmanager

If you don’t know what Alertmanager is you can read all about it on the Prometheus website, but the short version is that Alertmanger is a receiver, consolidator, and router of alerting messages that offers LOTS of flexibility when it comes to configurations. From my old days as a SysAdmin, the tools I used weren’t smart enough to deduplicate alerts so I’d have my boss yelling, my coworkers emailing, and my phone (ok…Blackberry) battery depleting itself vibrating to the same alert over and over until I could manage to put the alert in maintenance mode and the queue of alerts drained. Alertmanager is smart enough to deduplicate alerts so you don’t get 50 pages telling you the disk is 90% full before you can grow the volume or purge files. It’s also extremely easy to group alerts so that you don’t get alerts for ‘Application Down’, ‘MySQL Down’, ‘CPU|RAM|Disk: Unavail’, etc. because someone rebooted the DB server without putting monitoring in maintenance mode. Alertmanager also offers many native integrations so you can route alerts to email, SMS, PagerDuty, Slack, and more!

Now, this is our first iteration of Alertmanager support so at this point you will need your own working Alertmanager installation that your PMM server can communicate with. The only other thing you’ll need are the rules you want to trigger alerts from. That’s basically it! You most likely already know how to create yaml style rules but for the curious, it looks something like this:

- alert: PostgresqlDown
  expr: pg_up == 0
  for: 5m
  labels:
    severity: error
  annotations:
    summary: "PostgreSQL down (instance {{ $labels.service_name }})"
    description: "PostgreSQL instance is down\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"

The above will trigger an alert to let you know which PMM instances of PostgreSQL are down for more than 5 minutes. Since this first pass targets the experienced users, I’ll leave it to you to craft your own rules but we’re really excited to be adding this sorely needed functionality!

Prometheus Alertmanger

For more information, you can read our AlertManager integration documentation and FAQs. Update your instance today and let us know what you think, we would love to hear your feedback!

↧

MySQL ERROR 1034: Incorrect Key File on InnoDB Table

February 25, 2020, 5:45 am

≫ Next: Embrace the Cloud with Microsoft Azure

≪ Previous: Percona Monitoring and Management, Meet Prometheus Alertmanger

MySQL ERROR 1034: Incorrect Key File on InnoDB Table

Sometimes, you may experience “ERROR 1034: Incorrect key file” while running the ALTER TABLE or CREATE INDEX command:

mysql> alter table ontime add key(FlightDate);
ERROR 1034 (HY000): Incorrect key file for table 'ontime'; try to repair it

As the error message mentions key file, it is reasonable to assume we’re dealing with the MyISAM storage engine (the legacy storage engine which used to have such a thing), but no, we can clearly see this table is InnoDB!

When the error message in MySQL is confusing or otherwise unhelpful, it is a good idea to check the MySQL error log:

2019-02-24T02:02:26.100600Z 9 [Warning] [MY-012637] [InnoDB] 1048576 bytes should have been written. Only 696320 bytes written. Retrying for the remaining bytes.
2019-02-24T02:02:26.100884Z 9 [Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
2019-02-24T02:02:26.100894Z 9 [ERROR] [MY-012639] [InnoDB] Write to file (merge) failed at offset 625999872, 1048576 bytes should have been written, only 696320 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2019-02-24T02:02:26.100907Z 9 [ERROR] [MY-012640] [InnoDB] Error number 28 means 'No space left on device'

The most important part of this message is “Error number 28 means ‘No space left on device’” – so, we’re simply running out of disk space. You may wonder, though, what file is it being written to and where is it located? “Write to file (merge) failed” is your (albeit, not fully helpful) indication; “merge” here corresponds to the temporary file which is used to perform a Merge Sort operation when building Indexes through Sort (AKA Innodb Fast Index Creation).

This file is created in the directory set by innodb_tmpdir server variable if it is not set by the setting of tmpdir variable or OS default, such as /tmp on Linux. In many cases, such a tmpdir may be located on a filesystem that has little space, making this error occur quite frequently.

The amount of disk space required can be significant, sometimes exceeding the total size of the final table. When adding indexes on CHAR/VARCHAR columns, especially with multibyte character sets (utf8, utf8mb3, utf8mb4), the space allocated for each index entry will be roughly a multiple of the number of bytes per character in charset to the maximum length of the string. So adding an index on utf8 VARCHAR(100) column will require roughly 400 bytes for every row in the table.

Summary:

Are you getting the “ERROR 1034: Incorrect key file” message for InnoDB table? Check your error log and the tmpdir server variable!

↧

Embrace the Cloud with Microsoft Azure

February 25, 2020, 11:44 am

≫ Next: Percona Monitoring and Management Continues to See Increased Adoption

≪ Previous: MySQL ERROR 1034: Incorrect Key File on InnoDB Table

Embrace the Cloud with Microsoft Azure The race to the cloud continues gaining pace, and many businesses have already enthusiastically embraced the new opportunities and efficiencies the cloud can bring.

But, for those companies with more traditional technology structures and a lack of cloud knowledge or in-house expertise, a move to the cloud can still seem intimidating. Some businesses continue to prop up aging monolithic technologies, rather than risk relinquishing control of database security and availability, and due to concerns over downtime.

What is Microsoft Azure?

cloud Microsoft Azure is a cloud platform for building, deploying, and managing services and applications, around the globe.

It addresses many of the concerns associated with the cloud, offering an ever-expanding set of services to help organizations meet their business challenges, whether they are looking for partial or total cloud-based architecture.

You can build on Microsoft’s proven architecture to add cloud capabilities to your existing technology set-up using Azure’s Platform as a Service (PaaS) or use them to run your computing and network needs via Infrastructure as a Service (IaaS).

Azure offers a range of useful and innovative tools and services, from which you can pick and choose, depending on your business requirements. These include Virtual Machines, Cloud Services, Managed Disks, Storage Options, SQL Database, Band Width (Data Transfer), Computer Vision, Active Directory, APP Service, Azure Kubernetes Service (AKS), DevTest Labs, and many more.

Azure has a strong focus on AI, analytics, and the Internet of Things. AzureStack shows genuine market innovation. It is a portfolio of products that extend Azure services and capabilities to your environment of choice—from the data center to edge locations and remote offices. The portfolio enables hybrid and edge computing applications to be built, deployed, and run consistently across location boundaries.

Microsoft Azure

Photo: https://azure.microsoft.com/en-us/overview/azure-stack/#industries

Why Move to the Cloud?

Cloud computing platform providers promote themselves as cheaper and more reliable than traditional set-ups, and as offering stronger security than on-premises servers. They are more flexible, enabling you to instantly scale up and down depending on your business usage and storage requirements.

Offloading server management to your provider frees you from maintenance, scheduled downtime, and the risk of theft or damage, as well as eliminating the need to replace aging technology.

As cloud computing runs on data centers around the world, it provides resilience and reliability, as your data can be backed up to more than one geographic location. If required, your IT resources can be delivered from specific geographic locations.

Azure claims to be the only consistent hybrid cloud, with more regions than any cloud provider, delivering unparalleled developer productivity. They offer comprehensive compliance coverage, including meeting General Data Protection Regulation (GDPR).

Lower Costs

Setting up and maintaining on-site data centers is extremely expensive, especially when you factor in additional costs such as staffing, land and building purchase and maintenance, and buying and updating computer hardware.

A move to Azure means that you only pay for the resources you need, and these can be adjusted in real-time to match your business needs.

Focus on Security

Azure has spent 1B+ USD investment in security research and development and boasts a team of over 3,500 global cybersecurity experts, working together to help safeguard business assets and data.

From Microsoft’s perspective, security is foundational. Azure benefits from multi-layered security across physical data centers, infrastructure, and operations.

State-of-art security is delivered in Azure data centers globally. Azure is built with customized hardware and has security controls integrated into the hardware and firmware components. It includes added protection against threats such as distributed denial-of-service (DDoS). The breadth of Azure’s security features allows its cloud platform to provide customers with industry-leading security in-depth.

Azure’s high-speed and geographically decentralized infrastructure means that you have far more disaster recovery options when a problem occurs. Your critical applications and data can run from alternative sites during recovery periods, ensuring that downtime is minimized.

Explore Open Source Software

Azure actively supports open source software, driving innovation through collaboration, and contributing back to the community. Azure enables you to build on open source at scale, supporting newer open source technologies like .NET Core, TypeScript, and Visual Studio Code. They also contribute to projects such as Python, Kubernetes, PostgreSQL, and Linux kernel.

Because Azure supports open source technologies, you can use whichever tools and technologies you prefer. This means you can run virtually any application using your data source, with your operating system, on your device.

How Does Azure Compare to Amazon Web Services (AWS?)

Some of the world’s most prominent organizations have chosen Azure to support their business requirements, for good reasons:

Competitive pricing – According to Microsoft, AWS is five times more expensive than Azure for Windows Server and SQL Server. Azure matches AWS pricing for comparable services.

Open source community support – You can use a vast range of open-source operating systems, languages, and tools on Azure. Azure made the most contributions to GitHub in 2017 and is the only cloud with integrated support for Red Hat.

Enhanced and proactive security and compliance – Azure’s compliance offerings include 90+ compliance certifications. This represents a more comprehensive group of certifications than AWS currently presents.

Gain value from your existing Microsoft investment – You can maintain your organization’s existing tools and knowledge and achieve a consistent experience across your on-premises and cloud technologies by integrating them with Azure Active Directory. This helps to reduce the effort required by companies following a lift-and-shift strategy to move to Azure.

Percona’s Partnership With Azure

Percona is a proud Microsoft Azure partner, providing software, support, and monitoring and management services for Microsoft’s Azure cloud solutions.

Percona software is an ideal database option to run on Azure Virtual Machines, enabling companies of all sizes to build, test, and deploy applications on Azure’s highly scalable and reliable compute infrastructure.

The Percona DBA Service for Microsoft Azure, as well as other support services, is available to help Azure customers get the most benefit from their open source database environments. Percona can support your Azure environment, quickly resolve issues, and provide proactive updates, oversight, and assistance 24x7x365.

Conclusion

95% of Fortune 500 companies use Azure. It is a popular and proven cloud platform option, providing an extensive array of solutions suitable for all industries, large and small.

Azure supports a wide range of programming languages, frameworks, operating systems, databases, and devices, allowing developers to continue using the tools and technologies they trust. This flexibility is extremely attractive to companies that are increasingly running multiple applications against multiple databases in multiple locations.

Azure’s exceptional track record of investing in and implementing security processes, along with their extensive geographical coverage and Microsoft’s proven technology track-record, brings comfort to even the most cautious companies. The Azure approach to innovation, including AI, machine learning, and Cognitive API capabilities, shows that they are also looking to the future and exploring the opportunities this brings their users.

MSFT logo

Credit: Microsoft Azure information obtained from https://azure.microsoft.com/en-us/

Want to learn more?

Percona Monitoring and Management Continues to See Increased Adoption

February 26, 2020, 6:27 am

≫ Next: Benefit from Ongoing Innovation with Google Cloud

≪ Previous: Embrace the Cloud with Microsoft Azure

Percona Monitoring and Management Growth

As a new member of the Percona team, one of the first things I was interested in understanding was our software adoption. I decided to take a look at the information we have about adoption (instances running) of Percona Monitoring and Management (PMM) and found some very positive trends. First, I looked at overall instances as of the end of January 2020, which showed that PMM had an average of over 6,000 instances daily, with 14% of those being on version 2, which is pretty good considering it has only been out for around 4 months.

I then dug in a little bit deeper to look at the adoption of our latest release of PMM 2 which was released on September 19, 2019, so I started my analysis beginning in October for the ease of looking at full month comparisons.

First, when I looked at the average total instances running by month for PMM 2, I see a nice overall adoption with an overall growth of 175% for the 4 months, with our January monthly average at 850 instances.

Percona Monitoring and Management

The PMM team has a regular sprint cycle which results in 2.x releases about once a month, so the next thing I wanted to understand was if our customers are picking up the latest releases or are they sticking with older versions. As you can see in the graph below, as new releases come out, our customers are pretty quick to download and use the latest release.

PMM 2.1 is the first vertical blue line and PMM 2.2 is the second vertical blue line

This information is important because it lets us know that customers are quick to pick up the latest release and use the new features. It also means we need to make sure that our customers know when these releases are coming, where to get them, and what’s new for them to try out. Which is why PMM provides this information on its main dashboard.

You can see the current version you are running, the last time it checked for a new version, and if there is a new version available. You’ll also notice we provide a one-click update to the latest version. This could be why the adoption of the latest release is so high.

I have worked at many companies where understanding what customers are doing with your software is a huge challenge because there was little to no data. The best data we had was the number of downloads, which doesn’t really tell you what customers are actually running. Or, we talked with our support teams about what they were seeing being run by customers they were working with, but it is hard to track and you don’t really know if that is representative of your whole customer base.

So, naturally, I wanted to understand how Percona is getting such insightful data. When a user installs PMM Server, it sends a request to our “Check Version” system. When it sends this request, it provides the PMM version and unique ID information. This is what enables Percona to get an understanding of how many users we have for which versions of PMM.

Customers can always opt-out of sharing this information with us. You can easily switch this off via the on/off switch for Call home on the PMM Settings panel off the main dashboard.

You’ll notice that we also give you a separate switch for checking for updates. This ensures that even if you aren’t enabling a call home, you can still pull the information about the latest version and do a local check against what you are running to determine if you are up to date.

One question that the data naturally leads to is, “If we could ensure full backward compatibility, would customers want automatic upgrades?”. What do you think? Share your thoughts in the comments below or on Twitter @tlschloss.

↧

Benefit from Ongoing Innovation with Google Cloud

February 26, 2020, 10:20 am

≫ Next: Percona University is Coming to India in March!

≪ Previous: Percona Monitoring and Management Continues to See Increased Adoption

Percona and Google Cloud While Amazon Web Services (AWS) and Microsoft Azure continue to lead the pack, Google Cloud Platform had a good 2019, generating $10 billion of revenue and firmly positioning them as a viable alternative for large and small businesses.

Google Cloud is working hard to market its unique benefits, hoping to capture a share of current and future digital transformation projects.

One of their newer innovations is Anthos, an enterprise hybrid and multi-cloud platform launched in April 2019, which lets users run applications from anywhere. Anthos is an attractive proposition that can help you accelerate application development and enable consistency across hybrid and multi-cloud environments. It is built on open source technologies, including Kubernetes, Istio, and Knative.

Multi-cloud Options

Percona’s 2019 Open Source Data Management Software Survey indicates that the future is multi-cloud as many businesses look at reducing the risk of vendor lock-in and downtime by spreading their data, applications, and technology across multiple providers.

About a third of survey respondents used multiple clouds for their databases, including 41% of larger companies. AWS dominated as the primary provider, but according to our data, smaller companies surveyed are more likely to use Google than Microsoft.

If you want to follow a multi-cloud strategy, but don’t just want to rely on the top two players, Google Cloud represents a great alternative.

Google also has some big names in its portfolio. Salesforce has an expanding partnership with Google and recently announced that they are working with them to simplify how organizations connect data across platforms, to help speed digital transformation.

Open Source and Open Cloud

Image courtesy https://cloud.google.com/open-cloud

Google believes cloud openness matters more than ever, as this leads to faster innovation, tighter security, and freedom from vendor lock-in.

Open Cloud is the idea that you should be able to use a common development and operations approach to deliver your applications to different clouds. Open source in the cloud gives your control over where you deploy your IT investments. For example, customers can use Kubernetes to manage containers and TensorFlow to build machine learning models on-premises and on multiple clouds.

Google also encourages its employees to engage with open source. According to data analyzed from the GHarchive.org dataset on BigQuery, in 2017, over 5,500 Google individuals submitted code to nearly 26,000 repositories, created over 215,000 pull requests, and engaged with countless communities through almost 450,000 comments. Googlers are active contributors to popular projects such as Linux, LLVM, Samba, and Git.

Security and Pricing

Google Cloud capitalizes on the same technology that underlies Google’s private global network. As a result, Google states that your data is protected and that it also meets rigorous industry-specific compliance standards.

Google Cloud also offers a range of pricing innovations aimed at giving customers increased flexibility. These innovations, such as per-second billing and sustained-use discounts, are intended to offer users a range of options that don’t lock them into potentially obsolete technologies.

Google Cloud Solutions

Google Cloud aims to help companies find solutions that modernize their IT, unlock business value, and enable them to participate fully in the digital economy.

Google Cloud’s technology and partners offer a range of flexible infrastructure modernization approaches. Once you’ve built your framework, you can leverage their technological innovation via a number of solutions:

Conclusion

Although Google Cloud might not be the leading cloud provider in the market, it has an impressive history of technological innovation, and a strong sense of market direction and future requirements.

Enterprises across multiple industries are taking advantage of Google’s technological advances, from embedded Artificial Intelligence and Machine Learning to Digital Assistant, Search, Maps, Voice, and more. All of these are seamlessly integrated into Google Cloud.

Additionally, to avoid vendor lock-in and minimize potential downtime, many customers are looking at strategies where their data is shared across multiple clouds. Google’s commitment to open source and open cloud may well prove an advantage in this environment.

As a proud Google Cloud partner, Percona is committed to providing innovative solutions that help you make the most of your Google Cloud investment. Our Managed Services team provides world-class expertise to help you manage this high-performance database in this environment.

Credit: Google Cloud information obtained from https://cloud.google.com/

Want to learn more?

Percona University is Coming to India in March!

February 27, 2020, 10:28 am

≫ Next: Better Prometheus rate() Function with VictoriaMetrics

≪ Previous: Benefit from Ongoing Innovation with Google Cloud

PERCONA UNIVERSITY INDIA 2020 Experience a full day of deep-dive technical sessions as well as meet database experts and a community of open source database users, developers, and technologists at Percona University India!

Percona University is a series of free, Percona-hosted events focused on open source databases and related technologies. Since 2013, we have held these events in cities across the world to educate professionals and gather the community together.

In March 2020, we will be holding two Percona University events in India. The first will be in Noida, Uttar Pradesh, and the second will be in Bengaluru, the center of India’s high-tech industry.

March 26 from 10 am to 5 pm
Noida, Radisson Blu MBD Hotel
Percona University Noida – 26 March 2020

March 28 from 10 am to 5 pm
Bengaluru, Novotel Bengaluru
Percona University Bengaluru – 28 March 2020

Covered Technologies:

Open-source databases, with a focus on, but not limited to MySQL, MongoDB, PostgreSQL, and MariaDB
Database monitoring
Databases in the cloud
Kubernetes for databases

Agenda:

Percona founder and CEO Peter Zaitsev headlines the agenda with three talks:

The Percona Philosophy on Databases and Open Source
MySQL Optimization with Percona Monitoring and Management
Introduction to Percona Distribution for PostgreSQL

In addition, key Percona engineers will also present talks:

MySQL on Kubernetes with ProxySQL
MongoDB Sharded Cluster: How to Design your Topology
MySQL High Availability using Percona XtraDB Cluster & ProxySQL
Showcase Percona Kubernetes Operator for Percona XtraDB Cluster and CLI Tools
Keeping Enterprise-Grade MongoDB Free

More speakers and talks will be added to the agenda by March 15. We run a fast-paced event, and most talks will be 25 minutes. Networking refreshment breaks, an optional dinner, and a raffle are also available.

Join Us!

Free registration, agendas, and detailed information is available on each event page.

Percona University Noida – 26 March 2020

Percona University Bengaluru – 28 March 2020

For more information or if you have any questions, please contact: community-team@percona.com

Percona University India

↧