Rspamd architecture
Overview
Rspamd is a high-performance mail processing framework built on an event-driven, non-blocking architecture. It operates as a standalone service that integrates with Mail Transfer Agents (MTAs) through a well-defined HTTP/JSON API, maintaining strict isolation from MTA internals for enhanced security.
Core Architecture
Rspamd's architecture consists of several key components working together to provide comprehensive mail processing capabilities:
Main Components
- Main Process: Coordinates worker processes and handles configuration management
- Normal Workers: Handle message processing and HTTP requests
- Proxy Worker: Provides Milter protocol support and forwarding capabilities
- Controller Process: Provides web interface and management API
- Fuzzy Storage: Manages fuzzy hashing for near-duplicate detection
Message Processing Flow
Every message processed by Rspamd follows a structured pipeline that ensures thorough analysis while maintaining high performance.
Rules and Modules System
Rspamd uses a modular approach where functionality is implemented through modules that define rules. Each rule represents a specific message property or characteristic:
Rule Examples
SPF_ALLOW
- message validated by SPFBAYES_SPAM
- statistical spam classificationNEURAL_SPAM
- neural network spam detectionDKIM_VALID
- valid DKIM signatureURL_PHISHING
- phishing URL detected
Module Types
Rspamd supports multiple module implementations:
C Modules (Internal)
- Embedded in the core binary
- High-performance critical functions
- Examples: DKIM, SPF, regexp
Lua Modules (External)
- Full access to Rspamd Lua API
- Flexible and easily customizable
- Examples: multimap, rbl, phishing
Advanced Plugins
- Statistical and ML-based analysis
- Learn from existing symbols and patterns
- Provide adaptive classification
Scoring and Actions System
Metrics and Weights
Rules are assigned weights that represent their significance in determining message classification. The cumulative score determines the recommended action.
Weight Guidelines:
- Negative weights: decrease spam probability (ham indicators)
- Positive weights: increase spam probability (spam indicators)
- Higher absolute values: more significant rules
Action Thresholds
Based on the total score, Rspamd recommends one of several actions:
- No action (score < greylist): Message is likely legitimate
- Greylist (greylist ≤ score < add_header): Temporary delay recommended
- Add header (add_header ≤ score < rewrite_subject): Mark as likely spam
- Rewrite subject (rewrite_subject ≤ score < reject): Modify subject line
- Reject (score ≥ reject): Block message completely
Dynamic Scoring
Advanced rules can provide dynamic weights based on confidence levels. For example:
- Bayesian classifier: weight varies from 0 to maximum based on probability
- Neural networks plugin: clustering-based confidence scoring
- Reputation plugins: score based on IP/URL history
- GPT plugin: LLM-based analysis with confidence scores
Statistical Classification
Rspamd employs multiple statistical and machine learning algorithms:
Bayesian Classification
- OSB-Bayes algorithm with 2-word windows
- Token-based message analysis
- Multiple backend support (Redis, SQLite, files)
Neural Networks Plugin
- Perceptron-based clustering using existing symbols as features
- Adaptive learning from symbol patterns
- Provides additional classification signals
Fuzzy Hashing
- Near-duplicate detection using fuzzy hashes
- Distributed fuzzy storage
- Bulk message identification
Protocol and Integration
HTTP API
Rspamd communicates through HTTP/JSON protocol:
- RESTful endpoints for all operations
- JSON message format for requests/responses
- Authentication and SSL/TLS support
- Compression support (zstd) for efficient data transfer
MTA Integration
- Milter protocol support via proxy worker for Postfix/Sendmail
- HTTP proxy mode for load balancing and forwarding
- Direct HTTP integration for custom MTA setups
- After-queue processing for existing mail flows
Performance Optimizations
Rule Scheduling
Rspamd optimizes rule execution through intelligent scheduling:
- Negative rules first: Prevent false positives early
- Weight-based priority: Important rules execute first
- Frequency optimization: Common rules get priority
- Early termination: Stop processing when definitive result reached
Caching and Memory Management
- Redis caching for statistical data
- Memory-mapped files for fast data access
- Connection pooling for external services
- Asynchronous I/O for non-blocking operations
Configuration and Management
Configuration System
- UCL format (Universal Configuration Language)
- Hierarchical structure with includes and overrides
- Runtime reloading without service interruption
- Validation and testing modes
Signal Management
- SIGHUP: Graceful restart with config reload
- SIGTERM: Clean shutdown
- SIGUSR1: Log file rotation
- SIGUSR2: Worker respawn
Extensibility
Lua Framework
Rspamd provides a comprehensive Lua API enabling:
- Custom rule development
- External service integration
- Protocol extensions
- Advanced message manipulation
- Machine learning model integration
Plugin Architecture
- Composable functionality through plugins
- Event-driven hooks for message processing stages
- Shared data structures between modules
- Runtime module loading/unloading