Rspamd architecture

Overview

Rspamd is a high-performance mail processing framework built on an event-driven, non-blocking architecture. It operates as a standalone service that integrates with Mail Transfer Agents (MTAs) through a well-defined HTTP/JSON API, maintaining strict isolation from MTA internals for enhanced security.

Core Architecture

Rspamd's architecture consists of several key components working together to provide comprehensive mail processing capabilities:

Main Components

Main Process: Coordinates worker processes and handles configuration management
Normal Workers: Handle message processing and HTTP requests
Proxy Worker: Provides Milter protocol support and forwarding capabilities
Controller Process: Provides web interface and management API
Fuzzy Storage: Manages fuzzy hashing for near-duplicate detection

Message Processing Flow

Every message processed by Rspamd follows a structured pipeline that ensures thorough analysis while maintaining high performance.

Rules and Modules System

Rspamd uses a modular approach where functionality is implemented through modules that define rules. Each rule represents a specific message property or characteristic:

Rule Examples

SPF_ALLOW - message validated by SPF
BAYES_SPAM - statistical spam classification
NEURAL_SPAM - neural network spam detection
DKIM_VALID - valid DKIM signature
URL_PHISHING - phishing URL detected

Module Types

Rspamd supports multiple module implementations:

C Modules (Internal)

Embedded in the core binary
High-performance critical functions
Examples: DKIM, SPF, regexp

Lua Modules (External)

Full access to Rspamd Lua API
Flexible and easily customizable
Examples: multimap, rbl, phishing

Advanced Plugins

Statistical and ML-based analysis
Learn from existing symbols and patterns
Provide adaptive classification

Scoring and Actions System

Metrics and Weights

Rules are assigned weights that represent their significance in determining message classification. The cumulative score determines the recommended action.

Weight Guidelines:

Negative weights: decrease spam probability (ham indicators)
Positive weights: increase spam probability (spam indicators)
Higher absolute values: more significant rules

Action Thresholds

Based on the total score, Rspamd recommends one of several actions:

No action (score < greylist): Message is likely legitimate
Greylist (greylist ≤ score < add_header): Temporary delay recommended
Add header (add_header ≤ score < rewrite_subject): Mark as likely spam
Rewrite subject (rewrite_subject ≤ score < reject): Modify subject line
Reject (score ≥ reject): Block message completely

Dynamic Scoring

Advanced rules can provide dynamic weights based on confidence levels. For example:

Bayesian classifier: weight varies from 0 to maximum based on probability
Neural networks plugin: clustering-based confidence scoring
Reputation plugins: score based on IP/URL history
GPT plugin: LLM-based analysis with confidence scores

Statistical Classification

Rspamd employs multiple statistical and machine learning algorithms:

Bayesian Classification

OSB-Bayes algorithm with 2-word windows
Token-based message analysis
Multiple backend support (Redis, SQLite, files)

Neural Networks Plugin

Perceptron-based clustering using existing symbols as features
Adaptive learning from symbol patterns
Provides additional classification signals

Fuzzy Hashing

Near-duplicate detection using fuzzy hashes
Distributed fuzzy storage
Bulk message identification

Protocol and Integration

HTTP API

Rspamd communicates through HTTP/JSON protocol:

RESTful endpoints for all operations
JSON message format for requests/responses
Authentication and SSL/TLS support
Compression support (zstd) for efficient data transfer

MTA Integration

Milter protocol support via proxy worker for Postfix/Sendmail
HTTP proxy mode for load balancing and forwarding
Direct HTTP integration for custom MTA setups
After-queue processing for existing mail flows

Performance Optimizations

Rule Scheduling

Rspamd optimizes rule execution through intelligent scheduling:

Negative rules first: Prevent false positives early
Weight-based priority: Important rules execute first
Frequency optimization: Common rules get priority
Early termination: Stop processing when definitive result reached

Caching and Memory Management

Redis caching for statistical data
Memory-mapped files for fast data access
Connection pooling for external services
Asynchronous I/O for non-blocking operations

Configuration and Management

Configuration System

UCL format (Universal Configuration Language)
Hierarchical structure with includes and overrides
Runtime reloading without service interruption
Validation and testing modes

Signal Management

SIGHUP: Graceful restart with config reload
SIGTERM: Clean shutdown
SIGUSR1: Log file rotation
SIGUSR2: Worker respawn

Extensibility

Lua Framework

Rspamd provides a comprehensive Lua API enabling:

Custom rule development
External service integration
Protocol extensions
Advanced message manipulation
Machine learning model integration

Plugin Architecture

Composable functionality through plugins
Event-driven hooks for message processing stages
Shared data structures between modules
Runtime module loading/unloading

Overview​

Core Architecture​

Main Components​

Message Processing Flow​

Rules and Modules System​

Rule Examples​

Module Types​

Scoring and Actions System​

Metrics and Weights​

Action Thresholds​

Dynamic Scoring​

Statistical Classification​

Bayesian Classification​

Neural Networks Plugin​

Fuzzy Hashing​

Protocol and Integration​

HTTP API​

MTA Integration​

Performance Optimizations​

Rule Scheduling​

Caching and Memory Management​

Configuration and Management​

Configuration System​

Signal Management​

Extensibility​

Lua Framework​

Plugin Architecture​