ImapJunkExporter - Learn spam / Spam lernen

WolfTongue · Feb 10, 2024

(German below)
Hello,

if you encrypt the emails on the hard drive, unfortunately, you can no longer use the automatic learning of spam. However, since I still want to use it for selected email addresses, I have written a tool for this purpose.

The tool is relatively simple. It downloads at intervals all emails that have been manually moved to the spam folder. These emails are exported to the hard drive as unencrypted “.eml” files.

The learning itself must then be done on the server with a cron job as stated in the official instructions. The only disadvantage here is that you have to store every email account in a configuration file. For my few email accounts, this is currently a good solution. However, I still hope that there will be a feature in the future that allows automatic learning.

Link including a detailed documentation: MyUncleSam/ImapJunkExporter

Hallo,

wenn man die E-Mails auf der Festplatte verschlüsselt, dann kann man leider nicht mehr das automatische lernen von Spam nutzen. Da ich es dennoch für ausgewählte E-Mail Adressen nutzen möchte habe ich mir hierzu ein Tool geschrieben.

Das Tool ist relativ simple. Es lädt in Intervallen alle E-Mails herunter welche manuell in den Spam-Ordner verschoben wurden. Diese werden auf die Festplatte als unverschlüsselte “.eml” Dateien exportiert.

Das lernen selbst muss dann am Server mit einem Cron-Job durchgeführt werden wie es in der offiziellen Anleitung auch steht. Einziger Nachteil hierbei ist, dass man in einer Konfigurationsdatei jeden E-Mail Account hinterlegen muss. Für meine wenigen E-Mail Konten ist dies aktuell eine gute Lösung. Allerdings hoffe ich weiterhin, dass es irgendwann ein feature gibt mit welchem das automatische lernen ermöglicht wird.

Link inklusive einer detaillierten Dokumentation (auf Englisch): MyUncleSam/ImapJunkExporter

esackbauer · Feb 10, 2024

WolfTongue if you encrypt the emails on the hard drive, unfortunately, you can no longer use the automatic learning of spam.

Is that really needed?
See here for learning Spam/Ham:
Work with Spam Data - mailcow: dockerized documentation

docs.mailcow.email

Work with Spam Data - mailcow: dockerized documentation

None

"Rspamd learns mail as spam or ham when you move a message in or out of the junk folder to any mailbox besides trash. This is achieved by using the Sieve plugin “sieve_imapsieve” and parser scripts."

WolfTongue · Feb 10, 2024

My Mailcow is only storing encrypted mails to the local storage. So there is no folder to iterate over unencrypted mails for rspamd. So this is in my eyes not working for encrypted mail storages.

If I am wrong let me know, I also never tried to feed rspamd with encrypted mail files. Main reason is, that rspamd should have no information about the encryption.

(I checked the documentation again and it also says “You can use a one-liner to learn mail in plain-text (uncompressed) format”)

esackbauer · Feb 10, 2024

WolfTongue
as far as i know, decryption is only needed for access from outside mailcow/on file system. rspamd uses imapsieve to access the mails, therefore no file access necessary, and dovecot presents via imapsieve only decrypted mail.
Else it would make zero sense to learn spam/ham by moving mails between folders.

WolfTongue · Feb 10, 2024

In the last years I moved a lot of spam files into my junk folder. But I got the same kind of messages again and again. After introducing my solution I learned them and now they are not going to appear again (rspamd is now filtering them). So all in all if there is a sieve filter for spam it is not working. And on my system I am using the sieve filter to move mails so in general it is working fine.

I think that auto learning spam is not there or at least is not working as expected. So moving mails into the spam folder is doing nothing for me. Also I checked my sieve folder of my mail account and the only entries there are for moving mails but nothing else (/var/lib/docker/volumes/mailcowdockerized_vmail-vol-1/_data/domain.tld/username/sieve/sogo.sieve).

I checked the configuration and it seems like only dovecot knows the certificates for de-/encryption (mailcowdockerized_crypt-vol-1). So for me it seems like there is nothing triggered on moving mails into/from the junk folder. So moving them into junk is just placing them in another folder without any effect.

DocFraggle · Feb 10, 2024

Sorry, but that’s not true. Here you can find the ham and spam sieve files (report-ham.sieve and report-spam.sieve) which trigger the corresponding ham and spam bash scripts (rspamd-pipe-ham and rspamd-pipe-spam):

mailcow/mailcow-dockerizedtree/master/data/Dockerfiles/dovecot

You can test quite easily that it works if you add a line which logs something to a file in either rspamd-pipe-ham or rspamd-pipe-spam

Edit: just to clarify this: I just added the following line into /usr/lib/dovecot/sieve/rspamd-pipe-ham inside the dovecot container:

cat ${FILE} > /tmp/hamlog

Then I moved a mail from my Junk folder to the Inbox folder. The unencrypted content of the moved mail was in /tmp/hamlog afterwards.

I even logged the output of the 3 curl commands into a logfile:

{"success":true,"hashes":["16010468719eebce14f431cc0499298a9deb79f141481cf22b07f4742a1d935b1dec6465be7c6c4415e540821914ecdb44708bbedc69d96b119a77ae031986e2","9b3e1f31bff4f5457e6acd07a0a13fe0d37e508c9d9049c0d165ad8d7b4ce9137baa4e5af50ac5a041fb11ddffad7ceec48d90323fe5b2bf2969f4a1655c1c6d"]}
{"success":true}
{"success":true,"hashes":["16010468719eebce14f431cc0499298a9deb79f141481cf22b07f4742a1d935b1dec6465be7c6c4415e540821914ecdb44708bbedc69d96b119a77ae031986e2"
,"9b3e1f31bff4f5457e6acd07a0a13fe0d37e508c9d9049c0d165ad8d7b4ce9137baa4e5af50ac5a041fb11ddffad7ceec48d90323fe5b2bf2969f4a1655c1c6d"]}

That’s rspamd confirming the data.

WolfTongue · Feb 10, 2024

DocFraggle
I added it and logs are produced. Also I tried to run the commands one by another and each returned a success message.

So basically it seems to work this raises now another question for me:
In the past monthes I got spam messages. And a couple of them was the same content again and again. I checked and there were over 20 mails with the same content I moved into the junk folder. But it was delivered to my mailbox again and again. This stopped after learning it like I did in my solution.

So if this command learns spam in rspamd, why is it then so inefficient. In my case like not learning anything. And why is my solution so much more efficient?

For now I am not switching my solution off because this is the only way for me to fight spam right now :-(