Email Analysis & Debugging

These commands help you inspect and analyze email messages to understand how Rspamd processes them.

mime extract

Extract content from MIME messages for analysis.

Purpose

Extract plain text, HTML, words, or structural information from email messages to understand what content Rspamd sees and processes.

Common Scenarios

Debug Text Extraction Issues

When investigating why certain text isn't being analyzed:

# Extract plain text content
rspamadm mime extract -t message.eml

# Extract HTML content
rspamadm mime extract -H message.eml

# Extract in different formats
rspamadm mime extract -t -o decoded message.eml    # Decoded with charset
rspamadm mime extract -t -o decoded_utf message.eml # UTF-8 normalized
rspamadm mime extract -t -o oneline message.eml     # Single line

Analyze Word Extraction

To see exactly what words/tokens are extracted:

# Extract stemmed words (as used by Bayes)
rspamadm mime extract -w message.eml

# Extract normalized words
rspamadm mime extract -w -F norm message.eml

# Extract raw words (no processing)
rspamadm mime extract -w -F raw message.eml

# Extract full word information
rspamadm mime extract -w -F full message.eml

Inspect Message Structure

View part information and HTML structure:

# Show part information
rspamadm mime extract -p message.eml

# Show HTML structure (tags, attributes)
rspamadm mime extract -H -s message.eml

# Show invisible HTML content
rspamadm mime extract -H -i message.eml

Options

-t, --text              Extract plain text
-H, --html              Extract HTML
-o, --output <type>     Output format: raw, content, oneline, decoded, decoded_utf
-w, --words             Extract words
-F, --words-format      Word format: stem, norm, raw, full
-p, --part              Show part information
-s, --structure         Show HTML structure
-i, --invisible         Show invisible HTML content

mime stat - Extract statistical tokens
mime urls - Extract URLs only

mime stat

Extract statistical data from messages.

Purpose

Extract the same tokens and hashes that Rspamd uses for Bayes classification and fuzzy matching.

Common Scenarios

Debug Bayes Classification

See what tokens would be used for Bayes:

# Extract Bayes tokens
rspamadm mime stat -b message.eml

This shows the exact tokens that would be checked against the Bayes database, helping you understand why a message is classified as spam or ham.

Generate Fuzzy Hashes

Extract fuzzy hashes for debugging fuzzy matching:

# Extract fuzzy hashes
rspamadm mime stat -F message.eml

# Include shingles (detailed hash information)
rspamadm mime stat -F -s message.eml

Extract Lua Metatokens

View metatokens generated by Lua code:

rspamadm mime stat -m message.eml

Options

-m, --meta              Lua metatokens
-b, --bayes             Bayes tokens
-F, --fuzzy             Fuzzy hashes
-s, --shingles          Show shingles for fuzzy hashes

Use Cases

Debug why Bayes is not working as expected
Generate fuzzy hashes for manual comparison
Understand what statistical features are extracted

mime urls

Extract and analyze URLs from messages.

Purpose

Extract URLs as Rspamd sees them, useful for debugging URL-based rules and understanding how URLs are processed.

Common Scenarios

Extract All URLs

# Show full URL information
rspamadm mime urls -f message.eml

This shows URLs as processed by Rspamd, including normalized forms and components.

Get Unique Hosts

# Extract unique hostnames
rspamadm mime urls -H -u message.eml

# Sort by frequency
rspamadm mime urls -H -u --count -s message.eml

Check TLD Distribution

# Extract TLDs only
rspamadm mime urls -t -u message.eml

# Count TLD occurrences
rspamadm mime urls -t --count -s message.eml

Analyze Most Common Domains

# Show host counts in reverse order (most common first)
rspamadm mime urls -H -u --count -s -r message.eml

Options

-t, --tld               Get TLDs only
-H, --host              Get hosts only  
-f, --full              Show piecewise URLs as processed
-u, --unique            Print only unique URLs
-s, --sort              Sort output
--count                 Print count of each element
-r, --reverse           Reverse sort order

Use Cases

Debug why a URL rule isn't matching
Analyze phishing emails to find URL patterns
Check if URL redirects are being followed
Verify URL normalization

mime dump

Dump messages in various formats.

Purpose

Export message content in different structured formats for further processing or analysis.

Common Scenarios

Export for External Processing

# Dump as JSON
rspamadm mime dump -j message.eml

# Dump as UCL
rspamadm mime dump -U message.eml

# Dump as MessagePack
rspamadm mime dump -M message.eml

# Compact output
rspamadm mime dump -j -C message.eml

Process Multiple Messages

# Don't print filenames (for piping)
rspamadm mime dump -j --no-file message1.eml message2.eml > output.json

Options

-j, --json              JSON output
-U, --ucl               UCL output
-M, --messagepack       MessagePack output
-C, --compact           Compact format
--no-file               Don't print filename

Use Cases

Export message data for analysis in other tools
Generate structured data for machine learning
Debug message parsing issues

Practical Examples

Complete Message Analysis

To fully understand how Rspamd processes a message:

#!/bin/bash
MESSAGE="suspicious.eml"

echo "=== Text Content ==="
rspamadm mime extract -t "$MESSAGE"

echo -e "\n=== URLs ==="
rspamadm mime urls -f "$MESSAGE"

echo -e "\n=== Bayes Tokens ==="
rspamadm mime stat -b "$MESSAGE"

echo -e "\n=== Fuzzy Hashes ==="
rspamadm mime stat -F "$MESSAGE"

echo -e "\n=== Message Structure ==="
rspamadm mime extract -p "$MESSAGE"

Compare Word Extraction Methods

MESSAGE="test.eml"

echo "Stemmed words (Bayes):"
rspamadm mime extract -w -F stem "$MESSAGE" | head -20

echo -e "\nNormalized words:"
rspamadm mime extract -w -F norm "$MESSAGE" | head -20

echo -e "\nRaw words:"
rspamadm mime extract -w -F raw "$MESSAGE" | head -20

URL Analysis for Phishing Detection

# Find all unique domains in a phishing email
rspamadm mime urls -H -u phishing.eml

# Check if legitimate brand is being spoofed
rspamadm mime urls -f phishing.eml | grep -i "paypal\|bank\|amazon"

# Count total URLs
rspamadm mime urls phishing.eml | wc -l

Tips and Best Practices

Use -o decoded_utf for internationalized content - Ensures proper UTF-8 handling
Combine with grep - Pipe output to grep for specific patterns
Check both text and HTML - Some content may only appear in HTML parts
Use --count for frequency analysis - Helps identify patterns in bulk analysis
Export as JSON - For programmatic processing, JSON format is most versatile

Email Manipulation - Modify message content
Operations - Log searching with grep
Development - Corpus testing

mime extract​

Purpose​

Common Scenarios​

Debug Text Extraction Issues​

Analyze Word Extraction​

Inspect Message Structure​

Options​

Related Commands​

mime stat​

Purpose​

Common Scenarios​

Debug Bayes Classification​

Generate Fuzzy Hashes​

Extract Lua Metatokens​

Options​

Use Cases​

mime urls​

Purpose​

Common Scenarios​

Extract All URLs​

Get Unique Hosts​

Check TLD Distribution​

Analyze Most Common Domains​

Options​

Use Cases​

mime dump​

Purpose​

Common Scenarios​

Export for External Processing​

Process Multiple Messages​

Options​

Use Cases​

Practical Examples​

Complete Message Analysis​

Compare Word Extraction Methods​

URL Analysis for Phishing Detection​

Tips and Best Practices​

Related Documentation​

mime extract

Purpose

Common Scenarios

Debug Text Extraction Issues

Analyze Word Extraction

Inspect Message Structure

Options

Related Commands

mime stat

Purpose

Common Scenarios

Debug Bayes Classification

Generate Fuzzy Hashes

Extract Lua Metatokens

Options

Use Cases

mime urls

Purpose

Common Scenarios

Extract All URLs

Get Unique Hosts

Check TLD Distribution

Analyze Most Common Domains

Options

Use Cases

mime dump

Purpose

Common Scenarios

Export for External Processing

Process Multiple Messages

Options

Use Cases

Practical Examples

Complete Message Analysis

Compare Word Extraction Methods

URL Analysis for Phishing Detection

Tips and Best Practices

Related Documentation