Email Analysis & Debugging
These commands help you inspect and analyze email messages to understand how Rspamd processes them.
mime extract
Extract content from MIME messages for analysis.
Purpose
Extract plain text, HTML, words, or structural information from email messages to understand what content Rspamd sees and processes.
Common Scenarios
Debug Text Extraction Issues
When investigating why certain text isn't being analyzed:
# Extract plain text content
rspamadm mime extract -t message.eml
# Extract HTML content
rspamadm mime extract -H message.eml
# Extract in different formats
rspamadm mime extract -t -o decoded message.eml # Decoded with charset
rspamadm mime extract -t -o decoded_utf message.eml # UTF-8 normalized
rspamadm mime extract -t -o oneline message.eml # Single line
Analyze Word Extraction
To see exactly what words/tokens are extracted:
# Extract stemmed words (as used by Bayes)
rspamadm mime extract -w message.eml
# Extract normalized words
rspamadm mime extract -w -F norm message.eml
# Extract raw words (no processing)
rspamadm mime extract -w -F raw message.eml
# Extract full word information
rspamadm mime extract -w -F full message.eml
Inspect Message Structure
View part information and HTML structure:
# Show part information
rspamadm mime extract -p message.eml
# Show HTML structure (tags, attributes)
rspamadm mime extract -H -s message.eml
# Show invisible HTML content
rspamadm mime extract -H -i message.eml
Options
-t, --text Extract plain text
-H, --html Extract HTML
-o, --output <type> Output format: raw, content, oneline, decoded, decoded_utf
-w, --words Extract words
-F, --words-format Word format: stem, norm, raw, full
-p, --part Show part information
-s, --structure Show HTML structure
-i, --invisible Show invisible HTML content
Related Commands
mime stat
Extract statistical data from messages.
Purpose
Extract the same tokens and hashes that Rspamd uses for Bayes classification and fuzzy matching.
Common Scenarios
Debug Bayes Classification
See what tokens would be used for Bayes:
# Extract Bayes tokens
rspamadm mime stat -b message.eml
This shows the exact tokens that would be checked against the Bayes database, helping you understand why a message is classified as spam or ham.
Generate Fuzzy Hashes
Extract fuzzy hashes for debugging fuzzy matching:
# Extract fuzzy hashes
rspamadm mime stat -F message.eml
# Include shingles (detailed hash information)
rspamadm mime stat -F -s message.eml
Extract Lua Metatokens
View metatokens generated by Lua code:
rspamadm mime stat -m message.eml
Options
-m, --meta Lua metatokens
-b, --bayes Bayes tokens
-F, --fuzzy Fuzzy hashes
-s, --shingles Show shingles for fuzzy hashes
Use Cases
- Debug why Bayes is not working as expected
- Generate fuzzy hashes for manual comparison
- Understand what statistical features are extracted
mime urls
Extract and analyze URLs from messages.
Purpose
Extract URLs as Rspamd sees them, useful for debugging URL-based rules and understanding how URLs are processed.
Common Scenarios
Extract All URLs
# Show full URL information
rspamadm mime urls -f message.eml
This shows URLs as processed by Rspamd, including normalized forms and components.
Get Unique Hosts
# Extract unique hostnames
rspamadm mime urls -H -u message.eml
# Sort by frequency
rspamadm mime urls -H -u --count -s message.eml
Check TLD Distribution
# Extract TLDs only
rspamadm mime urls -t -u message.eml
# Count TLD occurrences
rspamadm mime urls -t --count -s message.eml
Analyze Most Common Domains
# Show host counts in reverse order (most common first)
rspamadm mime urls -H -u --count -s -r message.eml
Options
-t, --tld Get TLDs only
-H, --host Get hosts only
-f, --full Show piecewise URLs as processed
-u, --unique Print only unique URLs
-s, --sort Sort output
--count Print count of each element
-r, --reverse Reverse sort order
Use Cases
- Debug why a URL rule isn't matching
- Analyze phishing emails to find URL patterns
- Check if URL redirects are being followed
- Verify URL normalization
mime dump
Dump messages in various formats.
Purpose
Export message content in different structured formats for further processing or analysis.
Common Scenarios
Export for External Processing
# Dump as JSON
rspamadm mime dump -j message.eml
# Dump as UCL
rspamadm mime dump -U message.eml
# Dump as MessagePack
rspamadm mime dump -M message.eml
# Compact output
rspamadm mime dump -j -C message.eml
Process Multiple Messages
# Don't print filenames (for piping)
rspamadm mime dump -j --no-file message1.eml message2.eml > output.json
Options
-j, --json JSON output
-U, --ucl UCL output
-M, --messagepack MessagePack output
-C, --compact Compact format
--no-file Don't print filename
Use Cases
- Export message data for analysis in other tools
- Generate structured data for machine learning
- Debug message parsing issues
Practical Examples
Complete Message Analysis
To fully understand how Rspamd processes a message:
#!/bin/bash
MESSAGE="suspicious.eml"
echo "=== Text Content ==="
rspamadm mime extract -t "$MESSAGE"
echo -e "\n=== URLs ==="
rspamadm mime urls -f "$MESSAGE"
echo -e "\n=== Bayes Tokens ==="
rspamadm mime stat -b "$MESSAGE"
echo -e "\n=== Fuzzy Hashes ==="
rspamadm mime stat -F "$MESSAGE"
echo -e "\n=== Message Structure ==="
rspamadm mime extract -p "$MESSAGE"
Compare Word Extraction Methods
MESSAGE="test.eml"
echo "Stemmed words (Bayes):"
rspamadm mime extract -w -F stem "$MESSAGE" | head -20
echo -e "\nNormalized words:"
rspamadm mime extract -w -F norm "$MESSAGE" | head -20
echo -e "\nRaw words:"
rspamadm mime extract -w -F raw "$MESSAGE" | head -20
URL Analysis for Phishing Detection
# Find all unique domains in a phishing email
rspamadm mime urls -H -u phishing.eml
# Check if legitimate brand is being spoofed
rspamadm mime urls -f phishing.eml | grep -i "paypal\|bank\|amazon"
# Count total URLs
rspamadm mime urls phishing.eml | wc -l
Tips and Best Practices
- Use
-o decoded_utffor internationalized content - Ensures proper UTF-8 handling - Combine with grep - Pipe output to grep for specific patterns
- Check both text and HTML - Some content may only appear in HTML parts
- Use
--countfor frequency analysis - Helps identify patterns in bulk analysis - Export as JSON - For programmatic processing, JSON format is most versatile
Related Documentation
- Email Manipulation - Modify message content
- Operations - Log searching with
grep - Development - Corpus testing