Hello, i have updated 2 of my mailcow installations to 2023-02a and now i got the following messages from one of them. `Feb 23 15:07:41 mail01 dockerd[1861]: time="2023-02-23T15:07:41.377729880+01:00" level=warning msg="[resolver] failed to read from DNS server: 172.22.1.254:53, query: ;237.67.54.207.dnsbl.sorbs.net.\tIN\t A" error="read udp 172.22.1.253:49484->172.22.1.254:53: i/o timeout" Feb 23 15:20:16 mail01 dockerd[1861]: time="2023-02-23T15:20:16.153678731+01:00" level=warning msg="[resolver] failed to read from DNS server: 172.22.1.254:53, query: ;88.8.107.40.dnsbl.sorbs.net.\tIN\t A" error="read udp 172.22.1.13:46411->172.22.1.254:53: i/o timeout" Feb 23 15:20:17 mail01 dockerd[1861]: time="2023-02-23T15:20:17.150741003+01:00" level=warning msg="[resolver] failed to read from DNS server: 172.22.1.254:53, query: ;88.8.107.40.dnsbl.sorbs.net.\tIN\t A" error="read udp 172.22.1.13:35027->172.22.1.254:53: i/o timeout" Feb 23 15:20:18 mail01 dockerd[1861]: time="2023-02-23T15:20:18.152369881+01:00" level=warning msg="[resolver] failed to read from DNS server: 172.22.1.254:53, query: ;88.8.107.40.dnsbl.sorbs.net.\tIN\t A" error="read udp 172.22.1.13:37784->172.22.1.254:53: i/o timeout"` Everything seems to work fine, but i get those messages and couldn't find the source for that so i decided to ask the community for help. Kind regards Timo

@"teissler"#p8150 Having the same issue. I need to try rolling back. I’m wondering if it has to do with the SNAT changes to netfilter.

Same for me. Mailcow stopped working

I had this issue with my Exchange server too with Spamhaus RBL. It turned out that Spamhaus don't accept DNS resolution [via Cloudflare any more](https://www.spamhaus.com/resource-center/if-you-query-spamhaus-projects-dnsbls-via-cloudflares-dns-move-to-the-free-data-query-service/), I had to switch for their domain to the DNS of my ISP.

I'm sure that i have another issue, as i have installed unbound without any forwarders on my mailcow hosts. Also my unbound/mailcow host has an DNS PTR record set, so this should be a problem anyway.

Just did: - Fresh VM with clean ubuntu 22.04 installed - install mailcow - create_cold_standby.sh from original -> new - shutdown old mailcow vm - change ip addresses on new to be same as old - start mailcow on new VM Immediately am getting resolver timeouts on the new server

[resolver] failed to read from DNS server: ... i/o timeout

sean

This seems to be environmental. I started seeing a lot of these after migrating to a new server. Both hosted in OVH, though the original (no timeouts) was in their public cloud and the new (with timeouts) is a VM on a dedicated server.

Rspamd will retry 5 times by default, with a timeout of 1 second. So I increased the timeout to 5 seconds and these timeouts are mostly gone, and no requests fail the 5 attempts any more.
/opt/mailcow-dockerized/data/conf/rspamd/local.d/options.inc
dns { enable_dnssec = true; timeout = 5; retransmits = 5; }

Postfix was the bigger concern as it could refuse emails due to these DNS timeouts. I have currently commented out the DNS setting in docker-compose.yml so it just uses host DNS. Possibly using dns_opt in docker-compose.override.yml to increase the dns timeout would also work, but I haven’t tested it.

sean

Note to self:
If you increase the DNS timeouts like this, you should also increase the overall rspamd task timeout from 25s…

override.d/worker-normal.inc:task_timeout = 55s

teissler

@sean Thank you for that hints! i added those timeouts to my configuration and now i only got the errors from the postfix and olefy containers so far. This reduces the message flood in my inbox.

As my mailcow hosts are running as a VM on physical HW, which is also under my control, and still has plenty ressources free i assume that the issue is related to some configuration on the mailcow host.
I use debian 11 with nftables and local unbound (this is the only configured resolver). Unfortunately i didn’t had the time to deeply investigate the issue.

sean

I have a backup mx which is kept at the same mailcow version as the live server. It’s also a VM on my home ubuntu 22.04 server. It does not see the resolver problem. It does get less mail delivered to it however, which it just forwards to the real server.

My live server, during the recent move, kept the same mailcow version, but the host os went from ubuntu 20.04 to 22.04. The host is a vm in both cases, but moved from a small public cloud vm, to a vm on my own dedicated server - more resources.

Old server had never logged a resolver timeout (I can check historical logs because everything gets stashed in greylog)

So my main server changed host OS version and underlying hardware but not mailcow version, and immediately started logging many of these resolver timeouts.

I use iptables same configuration on old, new and backup.

I’ll upgrade the backup mx today and see what happens.

Might just start logging dropped udp sport=53 packets on the docker-user chain too.

sean

Just did:

Fresh VM with clean ubuntu 22.04 installed
install mailcow
create_cold_standby.sh from original -> new
shutdown old mailcow vm
change ip addresses on new to be same as old
start mailcow on new VM

Immediately am getting resolver timeouts on the new server

sean

So really just went from mailcow host being ubuntu 20.04 -> 22.04 without any other differences, and start seeing the resolver timeouts

teissler

Did you see the resolver messages in ubuntu 20.04? Ubuntu switch from iptables to nftables with 21.04, so this would support my theory that it’s related to nftables.

sean

No resolver timeouts seen on 20.04.

I think I agree with your theory, but haven’t been able to troubleshoot any further.

teissler

If i find the time i’ll try to switch one of my mailcow hosts to the nftables/iptables compatibilty layser and try that way.
I hope to test that until end of the week.

teissler

I have switched to iptables-legacy on two of my mailcow hosts, and even disabled the firewalld completely on one of them, but i still get the DNS timeouts…

It’s annoying… anybody here which can help?
Maybe i should try to open a bug report and see if we can find the fix for this that way?

teissler

I finally took the time and investigated that issue more deeply.

TL;DR It’s not a problem of iptables/nftables, nor debian, mailcow or unbound. It’s just an non existing DNS Record or sometimes a problem with a lame server of a DNS Blocklist.

I have started wireshark to log all network traffic of one of my mailcow hosts to finally find the reason for this.
I increased the verbosity of several daemons (postfix, rspamd and unbound).
With that amount of information i analysed every single occurence of these “dockerd * level=warning msg=”[resolver] failed to read from DNS server: * error=“read udp 172.22.1.253:43121->172.22.1.254:53: i/o timeout” messages.

None of the occurrences lead me to a reason why that happens and so i decided to dive in even more deeper. Docker Network Namespaces…
Mailcow uses a custom network “mailcow-network” and assigns most of the containers the unbound container as DNS server. Inside the container /etc/resolv.conf specifies 127.0.0.11 as nameserver. So how do the containers reach the unbound container? Here comes iptables into the play. As you can see docker creates DNAT and SNAT entries to rewrite the DNS requests to it’s embedded DNS server. The embedded DNS server then forwards the request either to the hosts DNS server of to that specified with dns config option. That’s why the log messages state “dockerd” as daemon of the message.

root@mail01:/opt/mailcow-dockerized# nsenter -n -t $(docker inspect --format {{.State.Pid}} $(docker ps -qf name=postfix-mailcow)) iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 167 packets, 9988 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 167 packets, 9988 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 188 packets, 14054 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  331 26899 DOCKER_OUTPUT  all  --  *      *       0.0.0.0/0            127.0.0.11          

Chain POSTROUTING (policy ACCEPT 519 packets, 40953 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  331 26899 DOCKER_POSTROUTING  all  --  *      *       0.0.0.0/0            127.0.0.11          

Chain DOCKER_OUTPUT (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            127.0.0.11           tcp dpt:53 to:127.0.0.11:43303
  331 26899 DNAT       udp  --  *      *       0.0.0.0/0            127.0.0.11           udp dpt:53 to:127.0.0.11:50947

Chain DOCKER_POSTROUTING (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 SNAT       tcp  --  *      *       127.0.0.11           0.0.0.0/0            tcp spt:43303 to::53
    0     0 SNAT       udp  --  *      *       127.0.0.11           0.0.0.0/0            udp spt:50947 to::53

With all that information and no misconfiguration or real problem found, i thought i should try those failing dns queries from one or more of my other servers. I also did not only ask my locally installed dns resolvers, but also google and cloudflare. Sometimes the queries failed even when using google or cloudflare nameservers.

So i started to investigate the corresponding DNS requests with dig and the “+trace” option, which performs an recursive search the root servers to the corresponding record. What i found was, that most of the dns requests leading to those error messages are related to either lame dns servers or to just non existing dns records.

A good example for that is “dnsbl.sorbs.net”, which has so many lame servers you nearly always get no response or an SERVFAIL for the requests. See https://mxtoolbox.com/SuperTool.aspx?action=a%3adnsbl.sorbs.net&run=toolpage

As i couldn’t find a real problem even after this all time and analysis, i will now create a logcheck (this tool brought me up to the error messages) ignore regex and forget about it.

kinds regards,
Timo

jman

@teissler I am also investigating this problem (running mailcow 2024-02) and scratching my head trying to understand the implications.

What do these errors actually mean? What are the consequences of not being able to reach those servers?

As @sean had pointed out those DNS errors seems to involve not only spamlist servers. Some domains seem completely made up (example: I see not existing subdomains on my A record 🤔 )

teissler As i couldn’t find a real problem even after this all time and analysis, i will now create a logcheck (this tool brought me up to the error messages) ignore regex and forget about it.

Where did you put such regex/script to ignore the failing DNS resolutions?

thanks

sean

Hi Timo,

Nice work, and I agree with you that the vast majority of these are just quite bad/slow DNS servers, but I’m not convinced that’s all.

I had one this morning when ACME was doing its cert check and for one domain it got the resolver timeout for mail.domainname, but succeeded on autoconfig.domainname and autodiscover.domainname, causing it to discard the certificate and get another one without the mail. name.

The domain DNS is google hosted and nothing else unusual happened around that time. Maybe just a glitch…

Not sure if I can configure retries for the ACME component here.

teissler

Can you provide the logfiles of the corresponding timeframe (+/- 5 minutes)?

jman

teissler in case it helps I’ll attach some entries I see in my logfiles.

Note how some domains have failing queries on multiple DNS records (A, AAAA, TXT, etc.). Also, all those tabulation chars (\t) are weird. I suspect mailcow is doing something strange here

# journalctl -eu docker | grep "failed to query external DNS server" | grep -o "question=\".*\""
question=";mailcow.email.\tIN\t A"
question=";mailcow.email.\tIN\t A"
question=";asn-check.mailcow.email.\tIN\t A"
question=";asn-check.mailcow.email.\tIN\t AAAA"
question=";asn-check.mailcow.email.\tIN\t A"
question=";asn-check.mailcow.email.\tIN\t AAAA"
question=";1.4.3.spamassassin.heinlein-support.de.\tIN\t TXT"
question=";sa-update.surbl.org.\tIN\t A"
question=";bazaar.abuse.ch.\tIN\t AAAA"
question=";bazaar.abuse.ch.\tIN\t A"
question=";bazaar.abuse.ch.\tIN\t AAAA"
question=";bazaar.abuse.ch.\tIN\t A"
question=";sa-update.surbl.org.\tIN\t A"
question=";bazaar.abuse.ch.\tIN\t AAAA"
question=";bazaar.abuse.ch.\tIN\t A"
question=";bazaar.abuse.ch.\tIN\t A"
question=";bazaar.abuse.ch.\tIN\t AAAA"
question=";10.175.169.194.in-addr.arpa.\tIN\t PTR"
question=";mailcow.email.\tIN\t A"
question=";fuzzy1.rspamd.com.\tIN\t AAAA"
question=";fuzzy1.rspamd.com.\tIN\t A"
question=";10.175.169.194.in-addr.arpa.\tIN\t PTR"
question=";17.188.51.209.dnsbl.sorbs.net.\tIN\t A"
question=";17.188.51.209.in-addr.arpa.\tIN\t PTR"
question=";133.40.70.185.b.barracudacentral.org.\tIN\t A"
question=";133.40.70.185.b.barracudacentral.org.\tIN\t A"
question=";170.49.182.213.in-addr.arpa.\tIN\t PTR"
question=";170.49.182.213.bip.virusfree.cz.\tIN\t A"
question=";170.49.182.213.bip.virusfree.cz.\tIN\t A"
question=";170.49.182.213.bip.virusfree.cz.\tIN\t A"
question=";154.199.212.54.in-addr.arpa.\tIN\t PTR"
question=";128.156.149.62.list.dnswl.org.\tIN\t A"
question=";128.156.149.62.dnsbl.sorbs.net.\tIN\t A"
question=";128.156.149.62.in-addr.arpa.\tIN\t PTR"
question=";smtpcmd11128.aruba.it.dbl.spamhaus.org.\tIN\t A"
question=";28.170.40.188.zen.spamhaus.org.\tIN\t A"
question=";28.170.40.188.in-addr.arpa.\tIN\t PTR"
question=";28.170.40.188.in-addr.arpa.\tIN\t PTR"
question=";28.170.40.188.asn.rspamd.com.\tIN\t TXT"
question=";et._spf.pardot.com.\tIN\t TXT"
question=";187.156.26.185.bip.virusfree.cz.\tIN\t A"
question=";187.156.26.185.bip.virusfree.cz.\tIN\t A"
question=";170.49.182.213.b.barracudacentral.org.\tIN\t A"
question=";170.49.182.213.b.barracudacentral.org.\tIN\t A"
question=";103.19.227.34.b.barracudacentral.org.\tIN\t A"
question=";103.19.227.34.b.barracudacentral.org.\tIN\t A"
question=";57.76.69.217.b.barracudacentral.org.\tIN\t A"
question=";57.76.69.217.in-addr.arpa.\tIN\t PTR"
question=";57.76.69.217.b.barracudacentral.org.\tIN\t A"
question=";98.146.70.69.dnsbl.sorbs.net.\tIN\t A"
question=";17.188.51.209.b.barracudacentral.org.\tIN\t A"
question=";17.188.51.209.b.barracudacentral.org.\tIN\t A"
question=";170.49.182.213.dnsbl.sorbs.net.\tIN\t A"
question=";smtp-02.tld.t-online.de.\tIN\t AAAA"
question=";17.188.51.209.b.barracudacentral.org.\tIN\t A"
question=";fuzzy.mailcow.email.\tIN\t AAAA"
question=";fuzzy.mailcow.email.\tIN\t AAAA"
question=";fuzzy.mailcow.email.\tIN\t AAAA"
question=";siqm4gydhs3hamcmh3cqosnzddtb78ew.uribl.rspamd.com.\tIN\t A"
question=";98.146.70.69.b.barracudacentral.org.\tIN\t A"
question=";98.146.70.69.b.barracudacentral.org.\tIN\t A"
question=";rspamd.\tIN\t A"
question=";rspamd.\tIN\t AAAA"
question=";rspamd.\tIN\t A"
question=";rspamd.\tIN\t AAAA"
question=";rspamd.\tIN\t A"
question=";rspamd.\tIN\t AAAA"
question=";bazaar.abuse.ch.\tIN\t A"
question=";bazaar.abuse.ch.\tIN\t A"
question=";170.49.182.213.list.dnswl.org.\tIN\t A"
question=";170.49.182.213.dnsbl.sorbs.net.\tIN\t A"
question=";170.49.182.213.in-addr.arpa.\tIN\t PTR"
question=";170.49.182.213.dnsbl.sorbs.net.\tIN\t A"
question=";11.178.32.34.in-addr.arpa.\tIN\t PTR"
question=";11.178.32.34.in-addr.arpa.\tIN\t PTR"
question=";98.146.70.69.b.barracudacentral.org.\tIN\t A"
question=";98.146.70.69.dnsbl.sorbs.net.\tIN\t A"
question=";98.146.70.69.b.barracudacentral.org.\tIN\t A"
question=";sabrinajewson.org.dbl.spamhaus.org.\tIN\t A"
question=";sabrinajewson.org.dbl.spamhaus.org.\tIN\t A"
question=";192.30.252.206._spf.mta.salesforce.com.\tIN\t A"
question=";62.1.49.65.zen.spamhaus.org.\tIN\t A"
question=";62.1.49.65.dnsbl.sorbs.net.\tIN\t A"
question=";spf.protection.outlook.com.\tIN\t TXT"
question=";248.176.236.87.in-addr.arpa.\tIN\t PTR"
question=";versatile.monitoring.internet-measurement.com.\tIN\t A"
question=";100.75.195.82.b.barracudacentral.org.\tIN\t A"
question=";100.75.195.82.dnsbl.sorbs.net.\tIN\t A"
question=";100.75.195.82.in-addr.arpa.\tIN\t PTR"
question=";100.75.195.82.in-addr.arpa.\tIN\t PTR"
question=";142.150.4.46.in-addr.arpa.\tIN\t PTR"
question=";mail.zendesk.com.\tIN\t TXT"