I finally took the time and investigated that issue more deeply.
TL;DR It’s not a problem of iptables/nftables, nor debian, mailcow or unbound. It’s just an non existing DNS Record or sometimes a problem with a lame server of a DNS Blocklist.
I have started wireshark to log all network traffic of one of my mailcow hosts to finally find the reason for this.
I increased the verbosity of several daemons (postfix, rspamd and unbound).
With that amount of information i analysed every single occurence of these “dockerd * level=warning msg=”[resolver] failed to read from DNS server: * error=“read udp 172.22.1.253:43121->172.22.1.254:53: i/o timeout” messages.
None of the occurrences lead me to a reason why that happens and so i decided to dive in even more deeper. Docker Network Namespaces…
Mailcow uses a custom network “mailcow-network” and assigns most of the containers the unbound container as DNS server. Inside the container /etc/resolv.conf specifies 127.0.0.11 as nameserver. So how do the containers reach the unbound container? Here comes iptables into the play. As you can see docker creates DNAT and SNAT entries to rewrite the DNS requests to it’s embedded DNS server. The embedded DNS server then forwards the request either to the hosts DNS server of to that specified with dns
config option. That’s why the log messages state “dockerd” as daemon of the message.
root@mail01:/opt/mailcow-dockerized# nsenter -n -t $(docker inspect --format {{.State.Pid}} $(docker ps -qf name=postfix-mailcow)) iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 167 packets, 9988 bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 167 packets, 9988 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 188 packets, 14054 bytes)
pkts bytes target prot opt in out source destination
331 26899 DOCKER_OUTPUT all -- * * 0.0.0.0/0 127.0.0.11
Chain POSTROUTING (policy ACCEPT 519 packets, 40953 bytes)
pkts bytes target prot opt in out source destination
331 26899 DOCKER_POSTROUTING all -- * * 0.0.0.0/0 127.0.0.11
Chain DOCKER_OUTPUT (1 references)
pkts bytes target prot opt in out source destination
0 0 DNAT tcp -- * * 0.0.0.0/0 127.0.0.11 tcp dpt:53 to:127.0.0.11:43303
331 26899 DNAT udp -- * * 0.0.0.0/0 127.0.0.11 udp dpt:53 to:127.0.0.11:50947
Chain DOCKER_POSTROUTING (1 references)
pkts bytes target prot opt in out source destination
0 0 SNAT tcp -- * * 127.0.0.11 0.0.0.0/0 tcp spt:43303 to::53
0 0 SNAT udp -- * * 127.0.0.11 0.0.0.0/0 udp spt:50947 to::53
With all that information and no misconfiguration or real problem found, i thought i should try those failing dns queries from one or more of my other servers. I also did not only ask my locally installed dns resolvers, but also google and cloudflare. Sometimes the queries failed even when using google or cloudflare nameservers.
So i started to investigate the corresponding DNS requests with dig and the “+trace” option, which performs an recursive search the root servers to the corresponding record. What i found was, that most of the dns requests leading to those error messages are related to either lame dns servers or to just non existing dns records.
A good example for that is “dnsbl.sorbs.net”, which has so many lame servers you nearly always get no response or an SERVFAIL for the requests. See https://mxtoolbox.com/SuperTool.aspx?action=a%3adnsbl.sorbs.net&run=toolpage
As i couldn’t find a real problem even after this all time and analysis, i will now create a logcheck (this tool brought me up to the error messages) ignore regex and forget about it.
kinds regards,
Timo