Bayes expiry
module provides intelligent expiration of statistical tokens for the new schema
of Redis statistics storage.
Configuration settings for bayes expiry
module should be added to the corresponding classifier
section (for instance in the local.d/classifier-bayes.conf
).
Bayes expiry
module requires new statistics schema. It should be enabled in the classifier configuration:
new_schema = true; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0
The following settings are valid:
bayes expiry
should set for tokens. Does not affect common
tokens. See expiration modes for detail. Supported values are:
-1
: make tokens persistent;false
: disable bayes expiry
for the classifier. Does not affect TTLs of existing tokens. This means tokens that already have TTLs will be expired by Redis. New learned tokens will be persistent.true
- enable lazy expiration mode (disabled by default). See expiration modes for detail.Configuration example:
new_schema = true; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0
expire = 8640000;
#lazy = true; # Before 2.0
Every minute bayes expiry
module executes an expiry step. On each step it checks frequencies of about 1000 statistical tokens and updates their TTLs if necessary. The time to complete a full iteration depends on the number of tokens. For instance, full expiry cycle for 10 million tokens takes about a week. When bayes expiry
module finishes full iteration it starts over again.
Bayes expiry
module distinguishes four groups of tokens based on frequency of their occurrence in ham and spam classes:
significant
and common
tokens.Default
mode has been removed in Rspamd 2.0 as it has no advantages over lazy
mode.
Operation:
significant
token’s lifetime: update token’s TTL every time to expire
value.insignificant
or infrequent
token.common
token: reset TTL to a low value (10d) if the token has greater TTL.Disadvantages:
expire
time. TTLs need to be periodically updating by bayes expiry
module. This means it requires special procedures to backup statistics. If you just make a copy of the *.rdb
file, you should know that it has a “shelf-life”. If you restore it after expire
time, all tokens will be expired.significant
tokens is unnecessary if no eviction policy is configured in Redis that assumes significant
tokens eviction.Lazy
mode is the only expiration mode since Rspamd 2.0.
Operation:
significant
token persistent if it has TTL.insignificant
or infrequent
token to expire
value if its current TTL is greater than expire
.common
token: resets TTL to a low value (10d) if the token has greater TTL.Advantages:
significant
tokens lose.To enable lazy expiration mode in Rspamd before 2.0 add lazy = true;
to the classifier configuration.
The expiration mode for existing statistics database can be changed in the configuration at any time. Tokens’ TTLs will be changed as necessary during the next expiry cycle.
If new expire
value is lower than current one then TTLs greater than new expire
value will be changed during the next expiry cycle.
In order to set expire value greater than current one, first you need to make tokens persistent (set expire = -1;
) and wait until at least one expiry cycle completed.
Then you can set new expire
value.
The Redis maxmemory
directive and volatile-ttl
eviction policy can be used to set a memory limit for the statistics dataset. Redis checks the memory usage, and if it is greater than the maxmemory
limit, it evicts keys with a shorter TTL according to the policy. It is also possible to keep memory usage at almost constant level by setting TTL to a very high value so keys never expire but being evicted instead.
To apply the memory limit and eviction policy only to the Bayesian statistics dataset, it should be stored in a separate Redis instance. Detailed explanation of multi-instance Redis configuration can be found in the Redis replication tutorial.
local.d/classifier-bayes.conf
:
backend = "redis"; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0
servers = "localhost:6378";
new_schema = true; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0
expire = 2144448000;
lazy = true; # Before 2.0
Where expire = 2144448000;
sets very high TTL (68 years) as we do not need to actually expire keys.
/usr/local/etc/redis-bayes.conf
:
include /usr/local/etc/redis.conf
port 6378
pidfile /var/run/redis/bayes.pid
logfile /var/log/redis/bayes.log
dbfilename bayes.rdb
dir /var/db/redis/bayes/
maxmemory 500MB
maxmemory-policy volatile-ttl
Where maxmemory 500MB
sets Redis to use the specified amount of memory for the instance’s dataset and maxmemory-policy volatile-ttl
sets Redis to use the eviction policy when the maxmemory
limit is reached.