Bayes expiry module provides intelligent expiration of statistical tokens for the
new schema of Redis statistics storage.
Configuration settings for
bayes expiry module should be added to the corresponding
classifier section (for instance in the
Bayes expiry module requires new statistics schema. It should be enabled in the classifier configuration:
new_schema = true;
The following settings are valid:
bayes expiryshould set for tokens. Does not affect
commontokens. See expiration modes for detail. Supported values are:
-1: make tokens persistent;
bayes expiryfor the classifier. Does not affect TTLs of existing tokens. This means tokens that already have TTLs will be expired by Redis. New learned tokens will be persistent.
true- enable lazy expiration mode (disabled by default). See expiration modes for detail.
new_schema = true; expire = 8640000; #lazy = true;
bayes expiry module executes an expiry step. On each step it checks frequencies of about 1000 statistical tokens and updates their TTLs if necessary. The time to complete a full iteration depends on the number of tokens. For instance, full expiry cycle for 10 million tokens takes about a week. When
bayes expiry module finishes full iteration it starts over again.
Bayes expiry module distinguishes four groups of tokens based on frequency of their occurrence in ham and spam classes:
significanttoken’s lifetime: update token’s TTL every time to
commontoken: reset TTL to a low value (10d) if the token has greater TTL.
volatile-lfueviction policy can be used to expire
expiretime. TTLs need to be periodically updating by
bayes expirymodule. This means it requires special procedures to backup statistics. If you just make a copy of the
*.rdbfile, you should know that it has a “shelf-life”. If you restore it after
expiretime, all tokens will be expired.
significanttokens is unnecessary if no eviction policy is configured in Redis that assumes
significanttoken persistent if it has TTL.
expirevalue if its current TTL is greater than
commontoken: resets TTL to a low value (10d) if the token has greater TTL.
Significanttokens are persistent and cannot be evicted.
To enable lazy expiration mode add
lazy = true; to the classifier configuration.
The expiration mode for existing statistics database can be changed in the configuration at any time. Tokens’ TTLs will be changed as necessary during the next expiry cycle.
expire value is lower than current one then TTLs greater than new
expire value will be changed during the next expiry cycle.
In order to set expire value greater than current one, first you need to make tokens persistent (set
expire = -1;) and wait until at least one expiry cycle completed.
Then you can set new