Major Release with Load-Aware Upstreams, MX Check Rework, Dynamic Composites, and Hardening
mx_check replaces its single domain-keyed cache with a three-layer Redis design (d:<domain> / m:<mxhost> / i:<ip> under <key_prefix>:), so two domains sharing an MX host (every G-Suite/M365 tenant, every ESP customer) reuse the m- and i-layer entries and hit cache with zero new DNS or TCP work. Outcome symbols are finer-grained and MX_NONE now replaces the old MX_NXDOMAIN/MX_MISSING; the probe is split into clean connect-only and full SMTP-banner shapes with multi-line greeting support. Operators should review their mx_check scores and any rules referencing the old symbol names (#6055, #6032)rspamd.com fuzzy rule now uses service=fuzzy+rspamd.com SRV-based discovery instead of a hardcoded round-robin host list, so backends and ports are managed entirely in DNS. The legacy fuzzy1/fuzzy2 hostnames keep resolving to every live backend, so installs that pinned the old string are unaffectedheaders sub-object in the multipart metadata part is injected into the task request headers (so task:get_request_header() works for custom fields), and the full metadata object is exposed to Lua via task:get_metadata() / task:get_metadata_field(key). Both paths travel in the multipart body, free of the 80KB HTTP header limit that v2 hits; rspamc gains a repeatable --metadata-header KEY=VALUE option (#6074)dynamic key in the composites { ... } block attaches a hot-reloadable map of composites (file, URL list, or signed map) using the same vocabulary as static composites. Reloads register new names with the symcache, materialise removed names as stubs, and swap an atomic generation so in-flight tasks keep their snapshot (#6064)task:has_symbol() / task:get_symbol() accept a {S1, ..., Sn} table form, and new task:has_symbol_regexp(re) / task:get_symbol_regexp(re) match fired symbol names against an rspamd_regexp userdatarspamd_tcp.new gains connect_timeout / read_timeout / write_timeout for independent per-phase budgets and an on_error(err, conn) callback that fires once for pre-connect failures (DNS, socket, connect refused/timeout, SSL handshake). Legacy single-timeout callers keep their existing deducted-budget contract unchanged (#6034)RSPAMD_UPSTREAM_RANDOM callers are transparently upgraded to P2C (#6013)struct upstream with its own error budget, weight, latency EWMA, and address list, so SRV weights are honoured and a single failing target no longer drains the whole cluster's budget (#6030)mx_check partitions resolved MX-target IPs into PUBLIC / LOCAL / BOGON (only PUBLIC is probed), adds bad_mxs (glob on hostnames) and bad_ips (radix on IPs) punishment maps with optional per-entry weight multipliers, per-source checks across envelope / reply-to / MIME-from, and check_authorized / check_local run-scope toggles (#6039, #6032)<RULE>_CHECK callback symbol so it is a predictable dependency target regardless of how the scan-result symbols are named, generalising the existing VADE_CHECK / CLOUDMARK_CHECK patternurl_redirector gains a per-hop Redis cache with intermediate-hop injection, so shared intermediate links are reused across chains and resolved hops are made visible to downstream modules; the walk self-heals on partial cache misses and surfaces previously-silent lock/stale conditions (#6014)url_redirector replaces its flat default_ua list with five coherent browser profiles (Chrome, Edge, Firefox, Safari), each bundling a User-Agent with the exact header set, values, and order that browser sends (including sec-ch-ua client hints where appropriate). One profile is picked per task and reused across every hop (#6053)redirector_hosts_map now accepts glob patterns (e.g. *.bit.ly, *.t.co); bare hostnames still match exactly, so existing maps are unaffected (#6056)url_redirector can issue GET (rather than HEAD) for an operator-defined list of URLs matched by regexp (#6043)RSPAMD_HTTP_FLAG_ORDERED_HEADERS emits headers in insertion order instead of hash order; lua_http accepts a list form ({{'name','value'}, ...}) to preserve order, used by the redirector stealth profileselastic module now logs Reply-To user/domain, received IPs, per-URL and CTA URL metadata via a new collect_urls block, and the forcing module name from task:has_pre_result() (#6018)extra_columns presets, including an initial outbound preset, can be selected without hand-rolling the schema (#5983)fuzzy_digest, fuzzy_shingles, authenticated, and received_count (#5981)video/audio/source/track/picture/svg, so their URLs and structure are visible to the parser; new tag IDs are appended so existing IDs stay stablerspamadm dmarc_report gains -w/--batch-wait to pause between batches (and stagger report generation), avoiding overload of the SMTP server and a weak resolver (#5985)--sort-by <col> and --group options for the rspamadm autolearnstats table output (#6050)lua_feedback_parsers lualib for parsing DSN (delivery status) and ARF (abuse) reports (#5982)lua_extras provides a two-phase loader for custom selectors, maps, and regexps under lua.local.d/{maps,selectors,regexps}/, resolving cross-kind dependencies (maps → selectors → regexps) so a selector can consume a map registered by an earlier kind (#6020)lua_scanners engine for the eXpurgate anti-spam service (#5755)rspamadm control memstat command reports per-worker RSS, per-callsite mempool counters, Lua heap usage, and structured per-arena jemalloc stats, with --short, --sort, and per-section toggle flags (#6016)RSPAMD_PIDFILE, RSPAMD_LOG_TYPE, RSPAMD_LOG_FILE); an empty pidfile disables it (useful as PID 1), and RSPAMD_LOG_TYPE=console logs to stdout. Stock installs render the previous defaults bit-for-bit (#6067)fasttext_model is configured, the shipped $SHAREDIR/languages/fasttext_model.ftz is loaded if present, so images bundling the model can drop the explicit override; stock installs without the file behave exactly as before (#6067)fpconv gains %.Nf fixed-point formatting with correct rounding and carry handling (#6061)NOSTAT; a composite is now deferred only when a dependency actually resolves to the postfilter stage, restoring task:get_groups()/get_symbols() visibility from postfilters200/1h plus 30/1m) keyed every bucket on the selector value alone, so only the last bucket was tracked and the other limits were silently ignored; each bucket now gets a distinct Redis key (#6076, #6059)url:get_raw() returned the partially percent-decoded scratch buffer for HTML URLs; url->raw now points at a mempool-owned copy of the verbatim trimmed href (#5986)mailto: URL were extracted as two separate emails; both injection sites now canonicalise to the slash-less mailto: form (RFC 6068) so dedup collapses themhttps://legit.com @evil.com/... userinfo-obfuscation phishing pattern; the cap is raised to 16KiB and the URL is flagged obscured as soon as userinfo crosses 64 bytesurl_suspect now requires a TLD of at least 3 chars for word_dot naked-domain matches, so prose like "pale blue dot so insignificant" no longer normalises to blue.so; explicit-protocol patterns still match two-char TLDslua_mime.modify_headers honoured its order list but serialised in hash order; ARC sets are now emitted in the conventional ARC-Seal / ARC-Message-Signature / ARC-Authentication-Results layout, which some validators (e.g. O365) require (#6052, #6045)dkim=permerror in Authentication-Results instead of falling through to dkim=none (#6028, #5957)INVALID_MSGID rule now honours mime_utf8, and the enable_mime_utf8 option spelling is registered as an alias so it actually takes effect, fixing false positives on valid SMTPUTF8 Message-IDs (RFC 6532) (#6011, #6007)cmd_session state (extensions buffer, key refs) leaked on every frame after the first on a persistent TCP connection; per-command state caching is dropped so each frame starts clean (#6001)disable_symbols_input (keyed on the providers config rather than the unrelated symbol catalogue), and training is retargeted to the newest profile so inference no longer goes dark for weeks until a fresh model trains (#6041)metadata.settings on /checkv3 were stashed directly on the task and skipped the apply pipeline; they now run through the same settings.lua apply path as v2, so action thresholds, symbols, subject, variables, and header edits take effect (#5999)task:get_request_header() works under v3, restoring v2 behaviour (#5998)rspamd_upstream_get_random looped forever when the only alive candidate matched the except argument; the empty and single-survivor cases are now front-gatedrspamadm tools never resolved the current Redis master, round-robining writes that failed READONLY on replicas; a new one-shot lua_redis.prepare_redis_setup resolves the master (and loads scripts) for tools like dmarc_report (#6015, #6009)lua_util.newdeque(); it now resets the buffer via the local Queue classos.date, which PUC-Rio Lua rejects for non-integer floats (only seen on the Fedora build)lua_redis as the Redis connection timeout; a separate redis_timeout (default 1.0s) is introduced and propagated into nested redis{} blocks (#5977)rspamadm vault list produced empty output for large vaults because the payload passed through a format-string logger; it is now written to stdout directly via io.write (#6006, #6005)x-binaryenc charset across all detection paths, silencing the spurious "cannot open converter" warning and correctly marking the part as raw binary (#5984)configtest now warns when task_timeout is less than a symcache symbol timeout, including per-worker overrides (#5978)string_view::data() for pointer access where libc++ returns a wrapped iterator from begin(), fixing builds with libc++ 22 on FreeBSD (#5969)mime_headers/mime_encoding now recompute lengths after in-place strip/trim rewrites, so stale trailing bytes are no longer pulled into the Message-ID or normalised charset name, and large-buffer offsets use goffset to avoid 32-bit truncationpkcs7-data OCTET STRING crashed the parser on the first byte check; the empty inner recursion is skipped and a defensive guard is added against NULL/empty buffersapplication/pkcs7-mime layers re-entered the parser without incrementing the nesting counter, recursing to a depth bounded only by message size and exhausting the worker stack; the S/MIME re-entry is now accounted against max_nested and the CMS/PKCS7/BIO objects are freed on the error pathbegin-base64 UUE prefix offsetrspamd_string_find_eoh peeked p[1] guarded only by p < end, reading one byte past the buffer on input ending in \r\r; it now checks p + 1 < endspf2. advanced past one unvalidated byte and read past the string end; the parser now advances only past the validated prefix with short-circuiting checksrdns_parse_labels never verified that label data fits within the packet, so a reply declaring more bytes than remained made the second-pass memcpy read past the (exactly-sized, on TCP) buffer; both plain and compressed labels are now validated, with an off-by-one fix in offset decompressionnGt, nLt, nvap); the replacement is now bounds-checked against the remaining bufferrspamd_url_maybe_regenerate_from_ip could read host[-1] on an all-dots or zero-length-after-decode host; the trailing-dot loop condition is reordered and host length is re-checked after decodingmsg_namelen before every recvmsg to avoid parsing stale stack bytes, validate the reconstructed 14-bit TCP frame length before use, and clamp n_extra_flags before the fixed-size reply memcpyThis major release reworks load-aware upstream selection (Power of Two Choices, latency EWMA, slow start, per-target SRV, deferred DNS) and the `mx_check` module (three-layer cache, finer outcome symbols, IP-class classification, trust maps), adds hot-reloadable dynamic composites, richer `/checkv3` metadata, phase-specific TCP timeouts, stalled-scan diagnostics, and numerous new features across logging, selectors, and tooling. It also lands an extensive round of memory-safety and DoS hardening across the MIME, archive, URL, DNS, HTML, SPF, and fuzzy-storage paths. Recommended upgrade for all users; operators of `mx_check` should review symbol names and scores, and Sentinel/SRV deployments will benefit from the upstream resilience improvements.