Rspamd GPT Plugin

sydneysider

Anyone using GPT Plugin that Rspamd has introduced since 3.9?

I have just configured it to use gpt-4o-mini and waiting to see it’s performance.

I’d love to hear back from the community if anyone has already started using it, and any tips and tricks you would like to share.

Of course, I’ll update my experience in this thread in a few days on effectiveness, API usage, cost, etc that might help others to use this AI feature to manage spam.

Thanks!

pkernstock

For some context:
If someone doesn’t know about this module:
https://rspamd.com/announce/2024/07/12/rspamd-3.9.0.html#:~:text=your%20own%20experiments.-,New%20GPT%20module,-This%20release%20provides
https://rspamd.com/misc/2024/07/03/gpt.html
https://rspamd.com/doc/modules/gpt.html

Quoting the rspamd docs for GPT module:

The Rspamd GPT Plugin, introduced in Rspamd 3.9, integrates OpenAI’s GPT API to enhance spam filtering capabilities using advanced natural language processing techniques. Here are the basic ideas behind this plugin:

The selected displayed text part is extracted and submitted to the GPT API for spam probability assessment

Additional message details such as Subject, displayed From, and URLs are also included in the assessment

Then, we ask GPT to provide results in JSON format since human-readable GPT output cannot be parsed (in general)

Some specific symbols (BAYES_SPAM, FUZZY_DENIED, REPLY, etc.) are excluded from the GPT scan

Obvious spam and ham are also excluded from the GPT evaluation

My thoughts:
While I find this idea very interesting indeed and have read the documentation a few weeks ago, I don’t really like the idea of sending the entire email body to GPT for evaluation. Specific parts, maybe - but not the entire mail body. That’s why I’m personally not using this feature.

sydneysider

pkernstock - Thanks for the links and providing a brief background. I should have done that with my OP.

I understand your concerns, and probably depending on the use case, it might not be a great idea to send the entire text to GPT.

I am currently self hosting for my personal emails / domains, and don’t have sensitive emails on those.

The GPT evaluation is pretty good and seems to mark the HAM/SPAM quite accurately so far. I’ll probably tweak the settings a little later to only handle specific domains I am hosting (if that were to be possible).

Thanks for your feedback!

pkernstock

sydneysider I’ll probably tweak the settings a little later to only handle specific domains

That’s indeed an interesting idea… The GPT module doesn’t allow something like this out-of-the-box, from what I can find in the documentation. But I like this approach.

sydneysider

pkernstock

That’s right, GPT module doesn’t have this “filtering” feature, but I was thinking of using the settings map (or user settings from Rspamd documentation) to achieve this. Essentially “apply” rules to disable, say, GPT group.

However, what I’ve noticed is when I’ve configured the GPT Plugin, it registers the SYMBOL’s GPT_HAM & GPT_SPAM, and you can see this in the Rspamd UI, but they are under the ungrouped section. This bit is wierd atm, and I’ll have to see how to set them into a group or probably look at disabling the symbols instead for specific users or in specific cases.

The user settings provides a wide range of filtering abilities, like subject, or even senders, etc.

However, at this time, my evaluation is on it’s performance and cost of using the API… I am using the gpt-4o-mini which is about $0.150/1M tokens, I will need to see how soon this is hit.

Atm, Rspamd’s default configuration in mailcow seems to be scanning the outgoing mails as well, which in my opinion a waste of use of GPT API, and that’s the first fix I’ll be doing.

pkernstock

sydneysider That’s right, GPT module doesn’t have this “filtering” feature

It does have condition setting however, which could be used to check for specific conditions like symbols: https://rspamd.com/doc/modules/gpt.html#:~:text=%20%23%20custom%20condition%20(lua%20function)%20condition%20%3D%20%22xxx%22%3B. But not sure how exactly.

Alternatively, it might be possible to modify symbols_to_except here directly: rspamd/rspamdblob/da0c2779441314d9482e0831792d3a5120f700ce/src/plugins/lua/gpt.lua#L80

sydneysider

pkernstock

Yes, I’ve made two changes so far:

settings map - excluded GPT for authenticated users (prevents GPT check on mailcow users sending mail).

symbols_to_exclude - as you can see from excerpt of my conf.

model = "gpt-4o-mini";
max_tokens = 1000;
temperature = 0.0;
top_p = 0.9;
timeout = 15s;
autolearn = true;
symbols_to_except = { BAYES_SPAM = 0.9, MAILCOW_FUZZY_DENIED = 1, FUZZY_DENIED = 1, }

Unfortunately, there is a bug in the get.lua that got shipped with 3.9.1 and fixed later:
rspamd/rspamd5ccf9bc

To temporarily circumvent that bug - I’m using the symbols_to_except.

The ‘condition’ function in the setting actually defaults to the buggy function (default_condition), that’s the next step I intend to do, and also add a few more conditionals in that to tweak the GPT check selection.

I am glad that we’re both on the same page, looking at the same things.

pkernstock

sydneysider settings map - excluded GPT for authenticated users (prevents GPT check on mailcow users sending mail).

Correct me if I’m wrong - but isn’t GPT scanning each email (which are not matching symbols_to_except)? How do you skip specific emails from being scanned?

sydneysider symbols_to_exclude - as you can see from excerpt of my conf.

The symbols_to_except does have some useful default symbols, which you are overwriting in your example. I don’t think overwriting the default is the best approach and better to append/merge them in Lua?

sydneysider add a few more conditionals in that to tweak the GPT check selection.

I’m not that familiar with rspamd, but I find to have GPT only scan emails with a certain symbol (like GPT_SCAN) would be best? So it would be easy to attach the symbol in rspamd configs to mails on certain conditions would be easy - for example here: mailcow/mailcow-dockerizedblob/master/data/conf/rspamd/local.d/composites.conf

sydneysider

pkernstock

As I mentioned earlier the gpt.lua has a bug in the conditional check, which was making GPT_CHECK for all the mails.
Instead of having to wait for the next Rspamd release and the mailcow container for the fixed gpt.lua packaged in it, I downloaded the latest gpt.lua from the Rspamd GitHub, and placed it in the plugins.d to override the /usr/share/rspamd/plugins/lua/gpt.lua in the container.
It does throw a warning of not re-registering the GPT_HAM and GPT_SPAM symbols, but it does load the updated one from the plugins.d.
That solved my first issue to be able to hit GPT check only when BAYES hasn’t been able to figure out a score.

So now, couple of updates over 24 hours of observation (and after the changes I put in) - the BAYESIAN learning (from the GPT output) has been really great, and now the number of requests actually going to GPT evaluation are far few - mainly the ones that are with lower scores like 0 ~ >-2.0.
This reduction in the API hit is nearly 70% while I have seen learning go up by over 50%.
That is a great outcome and I am guessing over a longer period of time the actual OpenAI API use will be only to periodical improve the learning/training the BAYESIAN.

Now for the conditional email scanning - this I figured is done by overriding the condition function, mainly replicating the checks, plus having additional checks where you could add:

sender checks
recipient checks
subject checks
specific content check

You can check this in the gpt.lua sourcecode, in the default_condition function, from line 125, which gives you access to the task object.
rspamd/rspamdblob/5ccf9bc7fb353c2bf20f7eb44feb283d4720bbdd/src/plugins/lua/gpt.lua

At the moment, I think these checks will be limited to config entries for the items to be checked, until I can figure how to extend this and write a plugin myself.

TBH, I haven’t yet done the above changes, as I want to observe further on the learning/training aspect with the GPT first.

Composites you mention are basically ability to use multiple rules which have been attributed symbols and treat them as a separate rule in itself with a separate weight. It doesn’t directly offer evaluation or attaching symbols to the message, without actually dipping into the message (task object as explained above).
The function enhancement I mentioned above on the other hand offers direct access to the message object and all the fields including the MIME content, and allows you to attach SYMBOLS for final assessment of the score.

So you could have composites defined in the composites.conf, and use the composite symbol for additional scoring assessment, but I’m guessing that’s not what you wanted.

In short, the override for the condition function offers a programmatic way to control what can be evaluated via GPT and what can be bypassed.

I hope I’ve been able to address your question.

p.s: I am not a lua programmer, but I learn quick and it’s more or less like python. 🙂