Quantcast
Channel: Percona Database Performance Blog
Viewing all articles
Browse latest Browse all 1785

How Securing MySQL with TCP Wrappers Can Cause an Outage

$
0
0
securing mysql tcp

The Case

Securing MySQL is always a challenge. There are general best practices that can be followed for securing your installation, but the more complex setup you have the more likely you are to face some issues which can be difficult to troubleshoot.

We’ve recently been working on a case (thanks Alok Pathak and Janos Ruszo for their major contribution to this case) where MySQL started becoming unavailable when threads activity was high, going beyond a threshold, but not always the same one.

During that time there were many logs like the following, and mysqld was becoming unresponsive for a few seconds.

2019-11-27T10:26:03.476282Z 7736563 [Note] Got an error writing communication packets
2019-11-27T10:26:03.476305Z 7736564 [Note] Got an error writing communication packets

The “Got an error writing communication packets” is a quite common log message which may be caused for multiple reasons.

B.4.2.10 Communication Errors and Aborted Connections is the link to the official MySQL documentation, but many blog posts have been written as well.

How We Approached This Issue to Find the Root Cause

The first thing to do was to remotely execute a simple loop to figure out if this is just randomly happening, whether this is a network issue or an issue related to mysqld itself.

[RDBA] percona@monitoring1: ~ $ time for i in {1..100}; do mysql -h 10.0.2.14 -Bsse "show status like '%uptime';"; done
Uptime	3540
Uptime	3540
Uptime	3540
Uptime	3541
Uptime	3541
Uptime	3541
Uptime	3541
Uptime	3542
Uptime	3542
Uptime	3542
Uptime	3543
Uptime	3543
Uptime	3543
Uptime	3543
Uptime	3543
Uptime	3544
^C

What we initially wanted to do was to confirm the behavior that was reported by the customer. So given that all the app servers were remotely located (thus clients connecting over TCP), we wanted to confirm if there are actually remote connections being dropped (so a network issue? or unresponsive MySQL for any reason? ). We also wanted to verify if there is a pattern, i.e. one connection out of X being dropped or connections being dropped after a certain amount of time. Having a pattern usually helps to identify what the root cause may be. Another reason for executing this remote connectivity loop was to verify if this issue is happening only when remotely connecting or if it also happens with local connections (local connection was later tested).

Doing some troubleshooting on the network layer there was nothing wrong, so we decided to locally start connecting to mysqld over TCP using another loop. This test showed that MySQL was indeed unavailable (or at least we could not randomly access it). Unfortunately, at that point, we didn’t test local connections through a socket. Connecting through a socket totally bypasses the network layer. If we had tried connecting using a socket we would have immediately realized that it was not actually a MySQL issue, as MySQL was always available (so something was blocking connections on the network level). Further details below.

Moving troubleshooting forward,

netstat
revealed many connections in a TIME_WAIT state. TIME_WAIT indicates that the source side has closed the connection. Below you can find an example, on a testing environment, of how you can use
netstat
to identify such TCP connections.
[RDBA] percona@db4-atsaloux: ~ $ sudo netstat -a -t
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:sunrpc          0.0.0.0:*               LISTEN
tcp        0      0 db4-atsaloux:42000      0.0.0.0:*               LISTEN
tcp        0      0 localhost:domain        0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:nrpe            0.0.0.0:*               LISTEN
tcp        0      0 db4-atsaloux:ssh        10.0.2.10:35230         ESTABLISHED
tcp        0     36 db4-atsaloux:ssh        10.0.2.10:39728         ESTABLISHED
tcp        0      0 db4-atsaloux:49154      10.0.2.11:mysql         ESTABLISHED
tcp6       0      0 [::]:mysql              [::]:*                  LISTEN
tcp6       0      0 [::]:sunrpc             [::]:*                  LISTEN
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN
tcp6       0      0 [::]:nrpe               [::]:*                  LISTEN
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50950         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50964         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50938         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50940         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51010         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50994         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50986         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:44110         ESTABLISHED
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50984         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50978         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51030         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50954         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51032         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51042         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50996         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51046         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51000         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50942         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:51004         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:44108         ESTABLISHED
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50992         TIME_WAIT
tcp6       0      0 db4-atsaloux:mysql      10.0.2.10:50988         TIME_WAIT

This made us believe that we may have run out of TCP connections on the TCP layer due to an increased amount of TCP sessions which were left open until the

time_wait
  timeout occurs. An interesting blog post, “Application Cannot Open Another Connection to MySQL” was written some time ago. This can give you a really good idea of what the “TIME_WAIT” issue is and what actions can be taken to remediate this.

We initially tried to fine-tune the port range

ip_local_port_range
  and adjust some kernel-related options like
tcp_tw_reuse
, but unfortunately, there was no luck. The behavior was still the same.

Inspecting network traffic revealed that the host was doing a crazy amount of requests to the DNS server defined into

/etc/resolv.conf
. Talking about network traffic inspection, we were not able to verify a few things on the network layer as network infrastructure was not managed by us. We got a confirmation from the customer’s IT department that nothing wrong was found on the network layer. What we could do was a packet inspection on the traffic coming in and out MySQL, and tcpdump helped identify the high amount of DNS requests and its slow responses back. The command initially used for packet inspection on the db node was
tcpdump dst port 3306 or src port 3306
and more specific filters were afterward to exclude and filter out any non-helpful information such as traffic between master and slaves.

At that time, another thing that came to our mind was to verify whether, for any reason, mysqld is trying to do any DNS lookups. That could explain a problem during the initial negotiation. Checking the variable the skip_name_resolve we found that it was already

ON
so mysqld should not perform any kind of DNS lookups.
db4-atsaloux (none)> select @@skip_name_resolve;
+---------------------+
| @@skip_name_resolve |
+---------------------+
|                   1 |
+---------------------+
1 row in set (0.00 sec)

Trying to further debug what MySQL was actually doing, we started an strace for the mysqld process.

The Root Cause

What we noticed was that mysql was too frequently accessing the

/etc/hosts.allow
and
/etc/hosts.deny
files. Voila!
root@db4-atsaloux:~# strace -e open,read -p$(pidof mysqld)
strace: Process 693 attached
# /etc/hosts.deny: list of hosts that are _not_ allowed to access the system.
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
# /etc/hosts.allow: list of hosts that are allowed to access the system.
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.deny: list of hosts"..., 4096) = 721
read(51, "", 4096)                      = 0
read(51, "# /etc/hosts.allow: list of host"..., 4096) = 464
read(51, "", 4096)                      = 0

From what we had seen, some new connections were taking a long time to connect MySQL. The strace on mysqld pid showed frequent access to the

/etc/hosts.allow
  and
/etc/hosts.deny
  files. These files are directly related to tcp wrappers! TCP wrappers are considered by many sysadmins as deprecated software (software development has stopped, but quite a few alternatives exist) but these are still widely used. With TCP wrappers, each new connection has to be checked against an ACL and based on this ACL a decision will be taken whether the remote host is allowed to connect to a service or not.

The DNS lookups that we found while troubleshooting had nothing to do with the

skip_name_resolve
mysql functionality nor mysqld itself and at that point, we knew that it was not actually a mysqld issue. MySQL was up and running being able to serve requests. Personally speaking, I’d describe this as a “by design” software Achilles’ Heel.

Doing some more investigation what we found was that there was a DNS misconfiguration (

/etc/resolv.conf
) so when DNS responses were slow or the DNS server was not responding, TCP wrappers made connections to mysql to stall or just be dropped while waiting for the DNS response.

Conclusion

Having that in mind, if for any reason you need to use TCP wrappers please always be aware that any DNS issues may cause a stall or an outage. With TCP wrappers, a reverse lookup may happen even if you have skip_name_resolve enabled. Even if you have not manually installed TCP wrappers there are some OS, like certain versions of Centos or Ubuntu, that by default have them available – so be careful if you are going to set up security rules for MySQL. Please note that not every single binary can make use of them. Binary should be linked against the TCP wrappers libraries.

If TCP wrappers are enabled for the MySQL service and you really need them, you should always ensure that DNS responses will be fast and for safety, even if this is not easily manageable, add any hosts connecting to mysqld into the

/etc/hosts
file so that a real DNS lookup won’t be needed for each connection. There are a few best practices while configuring DNS resolution. In my opinion, you should at least set more than one DNS server in
/etc/resolv.conf
and you should use some local ones or the closest ones to your linux server and maybe enable some kind of caching.

If you are facing a similar issue you can check whether mysqld is built against TCP wrappers doing an

ldd
over the mysqld binary and checking whether this is linked against the TCP wrappers libraries. If it is, and you face such an issue, then take a look into your system’s DNS configuration and whether DNS is performing ok or not.
[RDBA] percona@db4-atsaloux: ~ $ ldd /usr/sbin/mysqld | grep libwrap
libwrap.so.0 => /lib/x86_64-linux-gnu/libwrap.so.0 (0x00007fa80532c000)

 


Viewing all articles
Browse latest Browse all 1785

Trending Articles