Skip to main content

Email Analysis & Debugging

These commands help you inspect and analyze email messages to understand how Rspamd processes them.

mime extract

Extract content from MIME messages for analysis.

Purpose

Extract plain text, HTML, words, or structural information from email messages to understand what content Rspamd sees and processes.

Common Scenarios

Debug Text Extraction Issues

When investigating why certain text isn't being analyzed:

# Extract plain text content
rspamadm mime extract -t message.eml

# Extract HTML content
rspamadm mime extract -H message.eml

# Extract in different formats
rspamadm mime extract -t -o decoded message.eml # Decoded with charset
rspamadm mime extract -t -o decoded_utf message.eml # UTF-8 normalized
rspamadm mime extract -t -o oneline message.eml # Single line

Analyze Word Extraction

To see exactly what words/tokens are extracted:

# Extract stemmed words (as used by Bayes)
rspamadm mime extract -w message.eml

# Extract normalized words
rspamadm mime extract -w -F norm message.eml

# Extract raw words (no processing)
rspamadm mime extract -w -F raw message.eml

# Extract full word information
rspamadm mime extract -w -F full message.eml

Inspect Message Structure

View part information and HTML structure:

# Show part information
rspamadm mime extract -p message.eml

# Show HTML structure (tags, attributes)
rspamadm mime extract -H -s message.eml

# Show invisible HTML content
rspamadm mime extract -H -i message.eml

Options

-t, --text              Extract plain text
-H, --html Extract HTML
-o, --output <type> Output format: raw, content, oneline, decoded, decoded_utf
-w, --words Extract words
-F, --words-format Word format: stem, norm, raw, full
-p, --part Show part information
-s, --structure Show HTML structure
-i, --invisible Show invisible HTML content

mime stat

Extract statistical data from messages.

Purpose

Extract the same tokens and hashes that Rspamd uses for Bayes classification and fuzzy matching.

Common Scenarios

Debug Bayes Classification

See what tokens would be used for Bayes:

# Extract Bayes tokens
rspamadm mime stat -b message.eml

This shows the exact tokens that would be checked against the Bayes database, helping you understand why a message is classified as spam or ham.

Generate Fuzzy Hashes

Extract fuzzy hashes for debugging fuzzy matching:

# Extract fuzzy hashes
rspamadm mime stat -F message.eml

# Include shingles (detailed hash information)
rspamadm mime stat -F -s message.eml

Extract Lua Metatokens

View metatokens generated by Lua code:

rspamadm mime stat -m message.eml

Options

-m, --meta              Lua metatokens
-b, --bayes Bayes tokens
-F, --fuzzy Fuzzy hashes
-s, --shingles Show shingles for fuzzy hashes

Use Cases

  • Debug why Bayes is not working as expected
  • Generate fuzzy hashes for manual comparison
  • Understand what statistical features are extracted

mime urls

Extract and analyze URLs from messages.

Purpose

Extract URLs as Rspamd sees them, useful for debugging URL-based rules and understanding how URLs are processed.

Common Scenarios

Extract All URLs

# Show full URL information
rspamadm mime urls -f message.eml

This shows URLs as processed by Rspamd, including normalized forms and components.

Get Unique Hosts

# Extract unique hostnames
rspamadm mime urls -H -u message.eml

# Sort by frequency
rspamadm mime urls -H -u --count -s message.eml

Check TLD Distribution

# Extract TLDs only
rspamadm mime urls -t -u message.eml

# Count TLD occurrences
rspamadm mime urls -t --count -s message.eml

Analyze Most Common Domains

# Show host counts in reverse order (most common first)
rspamadm mime urls -H -u --count -s -r message.eml

Options

-t, --tld               Get TLDs only
-H, --host Get hosts only
-f, --full Show piecewise URLs as processed
-u, --unique Print only unique URLs
-s, --sort Sort output
--count Print count of each element
-r, --reverse Reverse sort order

Use Cases

  • Debug why a URL rule isn't matching
  • Analyze phishing emails to find URL patterns
  • Check if URL redirects are being followed
  • Verify URL normalization

mime dump

Dump messages in various formats.

Purpose

Export message content in different structured formats for further processing or analysis.

Common Scenarios

Export for External Processing

# Dump as JSON
rspamadm mime dump -j message.eml

# Dump as UCL
rspamadm mime dump -U message.eml

# Dump as MessagePack
rspamadm mime dump -M message.eml

# Compact output
rspamadm mime dump -j -C message.eml

Process Multiple Messages

# Don't print filenames (for piping)
rspamadm mime dump -j --no-file message1.eml message2.eml > output.json

Options

-j, --json              JSON output
-U, --ucl UCL output
-M, --messagepack MessagePack output
-C, --compact Compact format
--no-file Don't print filename

Use Cases

  • Export message data for analysis in other tools
  • Generate structured data for machine learning
  • Debug message parsing issues

Practical Examples

Complete Message Analysis

To fully understand how Rspamd processes a message:

#!/bin/bash
MESSAGE="suspicious.eml"

echo "=== Text Content ==="
rspamadm mime extract -t "$MESSAGE"

echo -e "\n=== URLs ==="
rspamadm mime urls -f "$MESSAGE"

echo -e "\n=== Bayes Tokens ==="
rspamadm mime stat -b "$MESSAGE"

echo -e "\n=== Fuzzy Hashes ==="
rspamadm mime stat -F "$MESSAGE"

echo -e "\n=== Message Structure ==="
rspamadm mime extract -p "$MESSAGE"

Compare Word Extraction Methods

MESSAGE="test.eml"

echo "Stemmed words (Bayes):"
rspamadm mime extract -w -F stem "$MESSAGE" | head -20

echo -e "\nNormalized words:"
rspamadm mime extract -w -F norm "$MESSAGE" | head -20

echo -e "\nRaw words:"
rspamadm mime extract -w -F raw "$MESSAGE" | head -20

URL Analysis for Phishing Detection

# Find all unique domains in a phishing email
rspamadm mime urls -H -u phishing.eml

# Check if legitimate brand is being spoofed
rspamadm mime urls -f phishing.eml | grep -i "paypal\|bank\|amazon"

# Count total URLs
rspamadm mime urls phishing.eml | wc -l

Tips and Best Practices

  1. Use -o decoded_utf for internationalized content - Ensures proper UTF-8 handling
  2. Combine with grep - Pipe output to grep for specific patterns
  3. Check both text and HTML - Some content may only appear in HTML parts
  4. Use --count for frequency analysis - Helps identify patterns in bulk analysis
  5. Export as JSON - For programmatic processing, JSON format is most versatile