Not completely sure but I think my first message is relevant to the issue. The postfix container logs show:
Mar 28 21:17:35 43f41ba4dfe6 postfix/postscreen[641]: CONNECT from [172.22.1.1]:53130 to [172.22.1.253]:25
Mar 28 21:17:35 43f41ba4dfe6 postfix/postscreen[641]: ALLOWLISTED [172.22.1.1]:53130
Mar 28 21:17:35 43f41ba4dfe6 postfix/smtpd[645]: connect from unknown[172.22.1.1]
Mar 28 21:17:35 43f41ba4dfe6 postfix/smtpd[645]: warning: connect to Milter service inet:rspamd:9900: Connection refused
Mar 28 21:17:35 43f41ba4dfe6 postfix/smtpd[645]: NOQUEUE: milter-reject: CONNECT from unknown[172.22.1.1]: 451 4.7.1 Service unavailable - try again later; proto=SMTP
Mar 28 21:17:35 43f41ba4dfe6 postfix/smtpd[645]: NOQUEUE: milter-reject: EHLO from unknown[172.22.1.1]: 451 4.7.1 Service unavailable - try again later; proto=SMTP helo=<mail.mydomain.tld>
Mar 28 21:17:35 43f41ba4dfe6 postfix/smtpd[645]: lost connection after STARTTLS from unknown[172.22.1.1]
Mar 28 21:17:35 43f41ba4dfe6 postfix/smtpd[645]: disconnect from unknown[172.22.1.1] ehlo=1 starttls=0/1 commands=1/2
I see a lot of these warning: connect to Milter service inet:rspamd:9900: Connection refused and the rspamd container has a very low uptime, it crashes continuously.
I suspect postfix is waiting on rspamd for 2 minutes then something “gives up” and the mail is accepted for delivery.
Now, why is rspamd crashing? In the rspamd container logs there’s something strange happening before the container crashes:
2026-03-28 21:41:36 #1(main) <62dfcf>; main; rspamd_term_handler: catch termination signal, waiting for 5 children for 60.00 seconds
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_srv_handler: cannot read from worker's srv pipe connection closed; command = heartbeat
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_check_termination_clause: normal process 35 terminated normally
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_cld_handler: do not respawn process normal after found terminated process with pid 35
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_srv_handler: cannot read from worker's srv pipe connection closed; command = heartbeat
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_check_termination_clause: rspamd_proxy process 33 terminated normally
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_cld_handler: do not respawn process rspamd_proxy after found terminated process with pid 33
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_srv_handler: cannot read from worker's srv pipe connection closed; command = heartbeat
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_check_termination_clause: controller process 34 terminated normally
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_cld_handler: do not respawn process controller after found terminated process with pid 34
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_srv_handler: cannot read from worker's srv pipe connection closed; command = heartbeat
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_check_termination_clause: fuzzy process 32 terminated normally
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_cld_handler: do not respawn process fuzzy after found terminated process with pid 32
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_srv_handler: cannot read from worker's srv pipe connection closed; command = heartbeat
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_check_termination_clause: hs_helper process 36 terminated normally
2026-03-28 21:41:37 #1(main) <62dfcf>; main; rspamd_cld_handler: do not respawn process hs_helper after found terminated process with pid 36
2026-03-28 21:41:37 #1(main) <62dfcf>; main; main: terminating...
2026-03-28 21:41:37 #1(main) <hsxxxx>; hyperscan; cleanup_maybe: cleaning up directory /var/lib/rspamd
2026-03-28 21:41:37 #1(main) <hsxxxx>; hyperscan; cleanup_maybe: remove stale hyperscan file /var/lib/rspamd/00a38f2bfa5673b0368e4abe706286fdf8eb38e0398c48524daa80a8f896f959.hsmp
2026-03-28 21:41:37 #1(main) <hsxxxx>; hyperscan; cleanup_maybe: remove stale hyperscan file /var/lib/rspamd/0f181b25300964b10e983c5ae5fd7ac24bf339eb9483340c0672a74d4a82e0a4.hsmp
2026-03-28 21:41:37 #1(main) <hsxxxx>; hyperscan; cleanup_maybe: remove stale hyperscan file /var/lib/rspamd/1ff7dc69bcd797d59b2fc702193cf502e58cb826502493dfad5317d8fd7fd834.hsmp
2026-03-28 21:41:37 #1(main) <hsxxxx>; hyperscan; cleanup_maybe: remove stale hyperscan file /var/lib/rspamd/374407ccd145b7108594953f8729bd49c17d21aa9262e6e3736acb010049ebe7.hsmp
2026-03-28 21:41:37 #1(main) <hsxxxx>; hyperscan; cleanup_maybe: remove stale hyperscan file /var/lib/rspamd/4aae0fb49664ac48a6222b91e1b474dc3c91e4be039be7c031ce7a9730884db1.hsmp
2026-03-28 21:41:37 #1(main) <hsxxxx>; hyperscan; cleanup_maybe: remove stale hyperscan file /var/lib/rspamd/787a7bacb83edba18c1b6fae72207bdfa2e672a523044e7d0e9aa5e74b644a7d.hsmp
2026-03-28 21:41:37 #1(main) <hsxxxx>; hyperscan; cleanup_maybe: remove stale hyperscan file /var/lib/rspamd/9ed02a10d8aa387d580cc8acafabd3617062db47388a3f8d0549dd9b0092ca9c.hsmp
2026-03-28 21:41:37 #1(main) <hsxxxx>; hyperscan; cleanup_maybe: remove stale hyperscan file /var/lib/rspamd/a7db4902c6a08a97c5cf92e3318fedf2d2c426406abebd374914cf040dfe84a2.hsmp
2026-03-28 21:41:37 #1(main) <hsxxxx>; hyperscan; cleanup_maybe: remove stale hyperscan file /var/lib/rspamd/fb5f98b301855dcaaf350f2ac1234b36b209de9e4daff99bcc7f05c59b6aa44a.hsmp
I am speculating that could be the root cause of these chain of errors.
I use mailcow 2026-01 and before upgrading I would like to understand this error. The version of the containers are:
- ghcr.io/mailcow/rspamd:3.14.2
- ghcr.io/mailcow/postfix:3.7.11-1
- redis:7.4.6-alpine
I’ll keep on investigating, I don’t yet understand what’s happening.