Bayesian statistics and fuzzy storage replication with multi-instance Redis backend
This tutorial presents a step-by-step guide on setting up statistics and fuzzy storage replication on FreeBSD. The configuration procedures for other operating systems are quite similar.
The tutorial focuses on a centralized model where Bayesian classifier and fuzzy storage learning occur on a single host and are then distributed among Rspamd installations in remote locations. For the sake of simplicity, the tutorial covers replication to a single replica database for each of the masters.
To achieve this, we need to replicate the bayes and fuzzy storage backend data to the remote host. Since we don't want to mirror the entire Redis cache, we should use dedicated Redis instances. It would be wise to separate the bayes and fuzzy storage as well.
We will create three Redis instances on both the master and replica sides: bayes, fuzzy, and redis for the remaining cache.
| instance | Redis socket |
|---|---|
redis | localhost:6379 |
bayes | localhost:6378 |
fuzzy | localhost:6377 |
Installation
To begin, install the databases/redis package by executing the following command:
# pkg install redis
Next, create separate working directories for the instances:
# cd /var/db/redis && mkdir bayes fuzzy && chown redis bayes fuzzy
To enable redis and its specific instances, add the following lines to the /etc/rc.conf file:
redis_enable="YES"
redis_profiles="redis bayes fuzzy"
To enable log rotation for Redis, create a newsyslog configuration file named /usr/local/etc/newsyslog.conf.d/redis.newsyslog.conf:
# logfilename [owner:group] mode count size when flags [/pid_file] [sig_num]
/var/log/redis/redis.log redis:redis 644 5 100 * J
/var/log/redis/bayes.log redis:redis 644 5 100 * J
/var/log/redis/fuzzy.log redis:redis 644 5 100 * J
Configuration
Generate the default configuration on both the master and replica hosts, which will be common for all instances:
# cp /usr/local/etc/redis.conf.sample /usr/local/etc/redis.conf
Due to security concerns, it is not advisable to expose Redis to external interfaces. Instead, configure Redis to listen on loopback interfaces and utilize stunnel to establish TLS tunnels between the replica and master hosts. However, please note that this approach also has its own security vulnerabilities. Therefore, do not set up replication if you cannot trust the users of the replica host.
Configure the listening sockets and memory limit (optional) as follows:
# diff -u1 /usr/local/etc/redis.conf.sample /usr/local/etc/redis.conf
--- /usr/local/etc/redis.conf.sample 2016-11-03 06:30:49.000000000 +0300
+++ /usr/local/etc/redis.conf 2016-11-27 13:10:43.671584000 +0300
@@ -60,3 +60,3 @@
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-bind 127.0.0.1
+bind 127.0.0.1 ::1
@@ -537,2 +537,3 @@
# maxmemory <bytes>
+maxmemory 200M
Configure the redis instance on both the master and replica hosts in a way that maintains compatibility with a single instance configuration. This ensures that if you already have a single instance database, it will continue to function properly.
/usr/local/etc/redis-redis.conf:
include /usr/local/etc/redis.conf
Master instances configuration
/usr/local/etc/redis-bayes.conf:
include /usr/local/etc/redis.conf
port 6378
pidfile /var/run/redis/bayes.pid
logfile /var/log/redis/bayes.log
dbfilename bayes.rdb
dir /var/db/redis/bayes/
maxmemory 600M
/usr/local/etc/redis-fuzzy.conf:
include /usr/local/etc/redis.conf
port 6377
pidfile /var/run/redis/fuzzy.pid
logfile /var/log/redis/fuzzy.log
dbfilename fuzzy.rdb
dir /var/db/redis/fuzzy/
If needed, the maxmemory is adjusted for specific instances according to expected database size.
Starting Redis on the master
# service redis start
Setting up encrypted tunnel using stunnel
Please refer to the Setting up encrypted tunnel using stunnel guide.
Replica instances configuration
/usr/local/etc/redis-bayes.conf:
include /usr/local/etc/redis.conf
port 6378
pidfile /var/run/redis/bayes.pid
logfile /var/log/redis/bayes.log
dbfilename bayes.rdb
dir /var/db/redis/bayes/
replicaof localhost 6478
maxmemory 600M
/usr/local/etc/redis-fuzzy.conf:
include /usr/local/etc/redis.conf
port 6377
pidfile /var/run/redis/fuzzy.pid
logfile /var/log/redis/fuzzy.log
dbfilename fuzzy.rdb
dir /var/db/redis/fuzzy/
replicaof localhost 6477
As replicas do not connect to masters directly, stunnel's sockets are specified in the replicaof directives.
Starting Redis on the replica
# service redis start
Checking
Check replica instances logs. If resynchronization with the masters was successful, you are done.
Rspamd configuration on the master
On the master side configure Rspamd to use distinct Redis instances respectively:
local.d/redis.conf:
servers = "localhost";
local.d/classifier-bayes.conf:
backend = "redis";
servers = "localhost:6378";
override.d/worker-fuzzy.inc:
backend = "redis";
servers = "localhost:6377";
Rspamd configuration on the replica
On the replica side Rspamd should use local redis instance for both reading and writing as it is not replicated.
local.d/redis.conf:
servers = "localhost";
Since local bayes and fuzzy Redis instances are replicas, Rspamd should use them for reading, but write to the replication master.
local.d/classifier-bayes.conf:
backend = "redis";
read_servers = "localhost:6378";
write_servers = "localhost:6478";
override.d/worker-fuzzy.inc:
backend = "redis";
read_servers = "localhost:6377";
write_servers = "localhost:6477";