Dovecot stopped tonight 04:00h and didn't restart

Nullmeridian

Dear Community,

I’m using MailCow on a Ubuntu 2204 VM, running on a Synology DS 920+, for some time now, and I’m usually happy and find my way around (also thanks to this great community)… until I ran into the following problem this morning: the Dovecot container stopped, probably triggered by Watchdog, since all the other containers stopped as well.

The log shows:
mailcowdockerized-dovecot-mailcow-1 | 2023-10-06T02:00:01.634938320Z 2023-10-06 04:00:01,579 WARN received SIGTERM indicating exit request mailcowdockerized-dovecot-mailcow-1 | 2023-10-06T02:00:01.748452680Z 2023-10-06 04:00:01,661 INFO waiting for processes, dovecot, syslog-ng to die mailcowdockerized-dovecot-mailcow-1 | 2023-10-06T02:00:01.748516194Z Oct 6 04:00:01 65901a56010a syslog-ng[118]: syslog-ng shutting down; version='3.28.1' mailcowdockerized-dovecot-mailcow-1 | 2023-10-06T02:00:01.916570784Z 2023-10-06 04:00:01,892 INFO stopped: syslog-ng (exit status 0) mailcowdockerized-dovecot-mailcow-1 | 2023-10-06T02:00:03.220295971Z 2023-10-06 04:00:03,219 WARN received SIGQUIT indicating exit request mailcowdockerized-dovecot-mailcow-1 | 2023-10-06T02:00:03.332303127Z 2023-10-06 04:00:03,330 INFO stopped: dovecot (exit status 0) mailcowdockerized-dovecot-mailcow-1 | 2023-10-06T02:00:03.332442180Z 2023-10-06 04:00:03,331 INFO reaped unknown pid 127 (exit status 0) mailcowdockerized-dovecot-mailcow-1 | 2023-10-06T02:00:03.380773089Z 2023-10-06 04:00:03,373 INFO stopped: processes (terminated by SIGTERM)

The container then remained mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 100 112k 100 112k 0 0 513k mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 mailcowdockerized-dovecot-mailcow-1 stopped until I started it manually this morning without issues, it is working fine since then: | 2023-10-06T05:56:37.082443505Z Uptime: 14160 Threads: 13 Questions: 19609 Slow queries: 0 Opens: 63 Open tables: 54 Queries per second avg: 1.384 | 2023-10-06T05:56:38.563639985Z The uservmail’ is already a member of tty'. | 2023-10-06T05:56:39.120033501Z % Total % Received % Xferd Average Speed Time Time Time Current | 2023-10-06T05:56:39.121424397Z Dload Upload Total Spent Left Speed 0 --:--:-- --:--:-- --:--:-- 513k | 2023-10-06T05:56:39.369412501Z 20_blatspammer.cf | 2023-10-06T05:56:39.369582748Z 70_HS_body.cf | 2023-10-06T05:56:39.372495628Z 70_HS_header.cf | 2023-10-06T05:56:40.037083765Z 2023-10-06 07:56:40,036 INFO Set uid to user 0 succeeded | 2023-10-06T05:56:40.047497739Z 2023-10-06 07:56:40,045 INFO supervisord started with pid 1 | 2023-10-06T05:56:41.062878900Z 2023-10-06 07:56:41,049 INFO spawned: 'processes' with pid 117 | 2023-10-06T05:56:41.062940148Z 2023-10-06 07:56:41,054 INFO spawned: 'dovecot' with pid 118 | 2023-10-06T05:56:41.071614843Z 2023-10-06 07:56:41,068 INFO spawned: 'syslog-ng' with pid 119 | 2023-10-06T05:56:41.243220478Z [2023-10-06T07:56:41.242412] WARNING: With use-dns(no), dns-cache() will be forced to 'no' too!; | 2023-10-06T05:56:41.247488372Z Oct 6 07:56:41 65901a56010a syslog-ng[119]: syslog-ng starting up; version='3.28.1' | 2023-10-06T05:56:41.812114744Z Oct 6 07:56:41 65901a56010a dovecot: doveadm(ysup0v5rwpnropyg@mailcow.local): Error: User doesn't exist | 2023-10-06T05:56:42.814420462Z 2023-10-06 07:56:42,813 INFO success: processes entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) | 2023-10-06T05:56:42.814466043Z 2023-10-06 07:56:42,814 INFO success: dovecot entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) | 2023-10-06T05:56:42.814479736Z 2023-10-06 07:56:42,814 INFO success: syslog-ng entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

When I read through the logs of this morning, I see some postfix issues, which seemingly resolved themselves, since the postfix container was up and running.
I attached the log as of today 04h - 05h local time.

The only thing happened around this time was the nighty renewal of my external IPv4/v6 addresses of my DSL connection.

It’s not too big of a deal to restart the container… if I’m at home (which often is not the case). So I would like to understand why the container didn’t come back up, i.e. where / in which log files can I find the root cause. Or how to avoid this issue altogether - can I safely disable watchdog in the conf file and run any system updates manually ?

Many thanks for your support and best regards,
Stefan

log-231006-02.txt

1MB

esackbauer

Nullmeridian The only thing happened around this time was the nighty renewal of my external IPv4/v6 addresses of my DSL connection.

What? You are using a dynamic IP address? For a mail server??

Nullmeridian

Hi, yes I do, updating the IP address via an update script using “qmcgaw/ddns-updater”. No issues so far. For sending, I’m using a mail relay email address of my provider. Works like a charm.
Any idea why the dovecot container didn’t come back ? Or at least where I could look at ?

esackbauer

As it seems there are no errors in dovecot, I guess something else is happening on that server, so that the watchdog thinks dovecot is unresponsive and tries a clean shutdown. Maybe you are running backup at that time? Or other things eating CPU or causing high I/O load?
Check the watchdog logs.

Nullmeridian

I checked all logs (I attached the complete log as of 4am in my first post). Can’t find any obvious error message… I would have expected some logging from dovecot or from watchdog.

Re memory: I assigned 7 GB, which is seemingly enough for a couple of mailboxes in 3 domains.
(MiB Mem : 6929.5 total, 642.6 free, 3795.5 used, 2491.5 buff/cache)

No other tasks running (my backup tasks start at 5am).

OK, it seems I’m the only one with this issue, and to be honest, it occurred only once since I switched from the Synology proprietary solution and started running mailcow some years ago, so let’s close this thread… thanks for your reply.

Have a good evening,
Stefan

esackbauer

mailcowdockerized-watchdog-mailcow-1 | 2023-10-06T02:18:30.016767651Z CRITICAL - Socket timeout

Find out why the host is losing network connectivity. You have a lot of them in your watchdog log, I don’t have a single one in several weeks.

Nullmeridian

Hi, I checked both the Synology (host of my VM) and the VM and couldn’t find any network related issues. Also, why would only dovecot be affected and none of the other containers, which all reported health = 100%.
The syno is directly attached to the switch / router, which both run stable.
I disabled watchdog (use watchdog = n) and for the moment, all is working fine and probably will be working fine for the upcoming years :-) thanks again for this great solution!