Module rspamd_mimepart
Module rspamd_textpart
This module provides different methods to manipulate text parts data. Text parts
could be obtained from the rspamd_task
by using of method task:get_text_parts()
Example:
rspamd_config.R_EMPTY_IMAGE = function (task)
parts = task:get_text_parts()
if parts then
for _,part in ipairs(parts) do
if part:is_empty() then
texts = task:get_texts()
if texts then
return true
end
return false
end
end
end
return false
end
Brief content:
Methods:
Method | Description |
---|---|
text_part:is_utf() | Return TRUE if part is a valid utf text. |
text_part:has_8bit_raw() | Return TRUE if a part has raw 8bit characters. |
text_part:has_8bit() | Return TRUE if a part has raw 8bit characters. |
text_part:get_content([type]) | Get the text of the part (html tags stripped). |
text_part:get_raw_content() | Get the original text of the part. |
text_part:get_content_oneline() | Get the text of the part (html tags and newlines stripped). |
text_part:get_length() | Get length of the text of the part. |
mime_part:get_raw_length() | Get length of the raw content of the part (e.g. |
mime_part:get_urls_length() | Get length of the urls within the part. |
mime_part:get_lines_count() | Get lines number in the part. |
mime_part:get_stats() | Returns a table with the following data. |
mime_part:get_words_count() | Get words number in the part. |
mime_part:get_words([how]) | Get words in the part. |
mime_part:filter_words(regexp, [how][, max]]) | Filter words using some regexp. |
text_part:is_empty() | Returns true if the specified part is empty. |
text_part:is_html() | Returns true if the specified part has HTML content. |
text_part:get_html() | Returns html content of the specified part. |
text_part:get_language() | Returns the code of the most used unicode script in the text part. |
text_part:get_charset() | Returns part real charset. |
text_part:get_languages() | Returns array of tables of all languages detected for a part. |
text_part:get_fuzzy_hashes(mempool) | Returns direct hash of textpart as a string and array [1..32] of shingles each represented as a following table. |
text_part:get_mimepart() | Returns the mime part object corresponding to this text part. |
Methods
The module rspamd_textpart
defines the following methods.
Method text_part:is_utf()
Return TRUE if part is a valid utf text
Parameters:
No parameters
Returns:
{boolean}
: true if part is validUTF8
part
Back to module description.
Method text_part:has_8bit_raw()
Return TRUE if a part has raw 8bit characters
Parameters:
No parameters
Returns:
{boolean}
: true if a part has raw 8bit characters
Back to module description.
Method text_part:has_8bit()
Return TRUE if a part has raw 8bit characters
Parameters:
No parameters
Returns:
{boolean}
: true if a part has encoded 8bit characters
Back to module description.
Method text_part:get_content([type])
Get the text of the part (html tags stripped). Optional type
defines type of content to get:
content
(default): utf8 content with HTML tags stripped and newlines preservedcontent_oneline
: utf8 content with HTML tags and newlines strippedraw
: raw content, not mime decoded nor utf8 convertedraw_parsed
: raw content, mime decoded, not utf8 convertedraw_utf
: raw content, mime decoded, utf8 converted (but with HTML tags and newlines)
Parameters:
No parameters
Returns:
{text}
:UTF8
encoded content of the part (zero-copy if not converted to a lua string)
Back to module description.
Method text_part:get_raw_content()
Get the original text of the part
Parameters:
No parameters
Returns:
{text}
:UTF8
encoded content of the part (zero-copy if not converted to a lua string)
Back to module description.
Method text_part:get_content_oneline()
Get the text of the part (html tags and newlines stripped)
Parameters:
No parameters
Returns:
{text}
:UTF8
encoded content of the part (zero-copy if not converted to a lua string)
Back to module description.
Method text_part:get_length()
Get length of the text of the part
Parameters:
No parameters
Returns:
{integer}
: length of part in bytes
Back to module description.
Method mime_part:get_raw_length()
Get length of the raw content of the part (e.g. HTML with tags unstripped)
Parameters:
No parameters
Returns:
{integer}
: length of part in bytes
Back to module description.
Method mime_part:get_urls_length()
Get length of the urls within the part
Parameters:
No parameters
Returns:
{integer}
: length of urls in bytes
Back to module description.
Method mime_part:get_lines_count()
Get lines number in the part
Parameters:
No parameters
Returns:
{integer}
: number of lines in the part
Back to module description.
Method mime_part:get_stats()
Returns a table with the following data:
lines
: number of linesspaces
: number of spacesdouble_spaces
: double spacesempty_lines
: number of empty linesnon_ascii_characters
: number of non ascii charactersascii_characters
: number of ascii characters
Parameters:
No parameters
Returns:
{table}
: table of stats
Back to module description.
Method mime_part:get_words_count()
Get words number in the part
Parameters:
No parameters
Returns:
{integer}
: number of words in the part
Back to module description.
Method mime_part:get_words([how])
Get words in the part. Optional how
argument defines type of words returned:
stem
: stemmed words (default)norm
: normalised words (utf normalised + lowercased)raw
: raw words in utf (if possible)full
: list of tables, each table has the following fields:- [1] - stemmed word
- [2] - normalised word
- [3] - raw word
- [4] - flags (table of strings)
Parameters:
No parameters
Returns:
{table/strings}
: words in the part
Back to module description.
Method mime_part:filter_words(regexp, [how][, max]])
Filter words using some regexp:
stem
: stemmed words (default)norm
: normalised words (utf normalised + lowercased)raw
: raw words in utf (if possible)full
: list of tables, each table has the following fields:- [1] - stemmed word
- [2] - normalised word
- [3] - raw word
- [4] - flags (table of strings)
Parameters:
regexp {rspamd_regexp}
: regexp to matchhow {string}
: what words to extractmax {number}
: maximum number of hits returned (all hits if <= 0 or nil)
Returns:
{table/strings}
: words matching regexp
Back to module description.
Method text_part:is_empty()
Returns true
if the specified part is empty
Parameters:
No parameters
Returns:
{bool}
: whether a part is empty
Back to module description.
Method text_part:is_html()
Returns true
if the specified part has HTML content
Parameters:
No parameters
Returns:
{bool}
: whether a part is HTML part
Back to module description.
Method text_part:get_html()
Returns html content of the specified part
Parameters:
No parameters
Returns:
{html}
: html content
Back to module description.
Method text_part:get_language()
Returns the code of the most used unicode script in the text part. Does not work with raw parts
Parameters:
No parameters
Returns:
{string}
: short abbreviation (such asru
) for the script's language
Back to module description.
Method text_part:get_charset()
Returns part real charset
Parameters:
No parameters
Returns:
{string}
: charset of the part
Back to module description.
Method text_part:get_languages()
Returns array of tables of all languages detected for a part:
- 'code': language code (short string)
- 'prob': logarithm of probability
Parameters:
No parameters
Returns:
{array|tables}
: all languages detected for the part
Back to module description.
Method text_part:get_fuzzy_hashes(mempool)
Returns direct hash of textpart as a string and array [1..32] of shingles each represented as a following table:
- [1] - 64 bit fuzzy hash represented as a string
- [2..4] - strings used to generate this hash
Parameters:
mempool {rspamd_mempool}
: - memory pool (usually task pool)
Returns:
{string,array|tables}
: fuzzy hashes calculated
Back to module description.
Method text_part:get_mimepart()
Returns the mime part object corresponding to this text part
Parameters:
No parameters
Returns:
{mimepart}
: mimepart object
Back to module description.
Back to top.
Module rspamd_mimepart
This module provides access to mime parts found in a message
Example:
rspamd_config.MISSING_CONTENT_TYPE = function(task)
local parts = task:get_parts()
if parts and #parts > 1 then
-- We have more than one part
for _,p in ipairs(parts) do
local ct = p:get_header('Content-Type')
-- And some parts have no Content-Type header
if not ct then
return true
end
end
end
return false
end
Brief content:
Methods:
Method | Description |
---|---|
mime_part:get_header(name[, case_sensitive]) | Get decoded value of a header specified with optional case_sensitive flag. |
mime_part:get_header_raw(name[, case_sensitive]) | Get raw value of a header specified with optional case_sensitive flag. |
mime_part:get_header_full(name[, case_sensitive]) | Get raw value of a header specified with optional case_sensitive flag. |
mimepart:get_header_count(name[, case_sensitive]) | Lightweight version if you need just a header's count. |
mimepart:get_raw_headers() | Get all undecoded headers of a mime part as a string. |
mimepart:get_headers() | Get all undecoded headers of a mime part as a string. |
mime_part:get_content() | Get the parsed content of part. |
mime_part:get_raw_content() | Get the raw content of part. |
mime_part:get_length() | Get length of the content of the part. |
mime_part:get_type() | Extract content-type string of the mime part. |
mime_part:get_type_full() | Extract content-type string of the mime part with all attributes. |
mime_part:get_detected_type() | Extract content-type string of the mime part. |
mime_part:get_detected_type_full() | Extract content-type string of the mime part with all attributes. |
mime_part:get_detected_ext() | Returns a msdos extension name according to lua_magic detection. |
mime_part:get_cte() | Extract content-transfer-encoding for a part. |
mime_part:get_filename() | Extract filename associated with mime part if it is an attachment. |
mime_part:is_image() | Returns true if mime part is an image. |
mime_part:get_image() | Returns rspamd_image structure associated with this part. |
mime_part:is_archive() | Returns true if mime part is an archive. |
mime_part:is_attachment() | Returns true if mime part looks like an attachment. |
mime_part:get_archive() | Returns rspamd_archive structure associated with this part. |
mime_part:is_multipart() | Returns true if mime part is a multipart part. |
mime_part:is_message() | Returns true if mime part is a message part (message/rfc822). |
mime_part:get_boundary() | Returns boundary for a part (extracted from parent multipart for normal parts and. |
mime_part:get_enclosing_boundary() | Returns an enclosing boundary for a part even for multiparts. |
mime_part:get_children() | Returns rspamd_mimepart table of part's childer. |
mime_part:is_text() | Returns true if mime part is a text part. |
mime_part:get_text() | Returns rspamd_textpart structure associated with this part. |
mime_part:get_digest() | Returns the unique digest for this mime part. |
mime_part:get_id() | Returns the order of the part in parts list. |
mime_part:is_broken() | Returns true if mime part has incorrectly specified content type. |
mime_part:headers_foreach(callback, [params]) | This method calls callback for each header that satisfies some condition. |
mime_part:get_parent() | Returns parent part for this part. |
mime_part:get_specific() | Returns specific lua content for this part. |
mime_part:set_specific(<any>) | Sets a specific content for this part. |
mime_part:is_specific(<any>) | Returns true if part has specific lua content. |
[`mime_part:get_urls([need_emails | list_protos][, need_images])`](#m4a20e) |
mime_part:get_stats() | Returns a table with the following data. |
Methods
The module rspamd_mimepart
defines the following methods.
Method mime_part:get_header(name[, case_sensitive])
Get decoded value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter.
Parameters:
name {string}
: name of header to getcase_sensitive {boolean}
: case sensitiveness flag to search for a header
Returns:
{string}
: decoded value of a header
Back to module description.
Method mime_part:get_header_raw(name[, case_sensitive])
Get raw value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter.
Parameters:
name {string}
: name of header to getcase_sensitive {boolean}
: case sensitiveness flag to search for a header
Returns:
{string}
: raw value of a header
Back to module description.
Method mime_part:get_header_full(name[, case_sensitive])
Get raw value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter. This method returns more information about the header as a list of tables with the following structure:
name
- name of a headervalue
- raw value of a headerdecoded
- decoded value of a headertab_separated
-true
if a header and a value are separated bytab
characterempty_separator
-true
if there are no separator between a header and a value
Parameters:
name {string}
: name of header to getcase_sensitive {boolean}
: case sensitiveness flag to search for a header
Returns:
{list of tables}
: all values of a header as specified above
Example:
function check_header_delimiter_tab(task, header_name)
for _,rh in ipairs(task:get_header_full(header_name)) do
if rh['tab_separated'] then return true end
end
return false
end
Back to module description.
Method mimepart:get_header_count(name[, case_sensitive])
Lightweight version if you need just a header's count
- By default headers are searched in caseless matter.
Parameters:
name {string}
: name of header to getcase_sensitive {boolean}
: case sensitiveness flag to search for a header
Returns:
{number}
: number of header's occurrences or 0 if not found
Back to module description.
Method mimepart:get_raw_headers()
Get all undecoded headers of a mime part as a string
Parameters:
No parameters
Returns:
{rspamd_text}
: all raw headers for a message as opaque text
Back to module description.
Method mimepart:get_headers()
Get all undecoded headers of a mime part as a string
Parameters:
No parameters
Returns:
{rspamd_text}
: all raw headers for a message as opaque text
Back to module description.
Method mime_part:get_content()
Get the parsed content of part
Parameters:
No parameters
Returns:
{text}
: opaque text object (zero-copy if not casted to lua string)
Back to module description.
Method mime_part:get_raw_content()
Get the raw content of part
Parameters:
No parameters
Returns:
{text}
: opaque text object (zero-copy if not casted to lua string)
Back to module description.
Method mime_part:get_length()
Get length of the content of the part
Parameters:
No parameters
Returns:
{integer}
: length of part in bytes
Back to module description.
Method mime_part:get_type()
Extract content-type string of the mime part
Parameters:
No parameters
Returns:
{string,string}
: content type in form 'type','subtype'
Back to module description.
Method mime_part:get_type_full()
Extract content-type string of the mime part with all attributes
Parameters:
No parameters
Returns:
{string,string,table}
: content type in form 'type','subtype', {attrs}
Back to module description.
Method mime_part:get_detected_type()
Extract content-type string of the mime part. Use lua_magic detection
Parameters:
No parameters
Returns:
{string,string}
: content type in form 'type','subtype'
Back to module description.
Method mime_part:get_detected_type_full()
Extract content-type string of the mime part with all attributes. Use lua_magic detection
Parameters:
No parameters
Returns:
{string,string,table}
: content type in form 'type','subtype', {attrs}
Back to module description.
Method mime_part:get_detected_ext()
Returns a msdos extension name according to lua_magic detection
Parameters:
No parameters
Returns:
{string}
: detected extension (see lua_magic.types)
Back to module description.
Method mime_part:get_cte()
Extract content-transfer-encoding for a part
Parameters:
No parameters
Returns:
{string}
: content transfer encoding (e.g.base64
or7bit
)
Back to module description.
Method mime_part:get_filename()
Extract filename associated with mime part if it is an attachment
Parameters:
No parameters
Returns:
{string}
: filename ornil
if no file is associated with this part
Back to module description.
Method mime_part:is_image()
Returns true if mime part is an image
Parameters:
No parameters
Returns:
{bool}
: true if a part is an image
Back to module description.
Method mime_part:get_image()
Returns rspamd_image structure associated with this part. This structure has the following methods:
get_width
- return width of an image in pixelsget_height
- return height of an image in pixelsget_type
- return string representation of image's type (e.g. 'jpeg')get_filename
- return string with image's file nameget_size
- return size in bytes
Parameters:
No parameters
Returns:
{rspamd_image}
: image structure or nil if a part is not an image
Back to module description.
Method mime_part:is_archive()
Returns true if mime part is an archive
Parameters:
No parameters
Returns:
{bool}
: true if a part is an archive
Back to module description.
Method mime_part:is_attachment()
Returns true if mime part looks like an attachment
Parameters:
No parameters
Returns:
{bool}
: true if a part looks like an attachment
Back to module description.
Method mime_part:get_archive()
Returns rspamd_archive structure associated with this part. This structure has the following methods:
get_files
- return list of strings with filenames inside archiveget_files_full
- return list of tables with all information about filesis_encrypted
- return true if an archive is encryptedget_type
- return string representation of image's type (e.g. 'zip')get_filename
- return string with archive's file nameget_size
- return size in bytes
Parameters:
No parameters
Returns:
{rspamd_archive}
: archive structure or nil if a part is not an archive
Back to module description.
Method mime_part:is_multipart()
Returns true if mime part is a multipart part
Parameters:
No parameters
Returns:
{bool}
: true if a part is is a multipart part
Back to module description.
Method mime_part:is_message()
Returns true if mime part is a message part (message/rfc822)
Parameters:
No parameters
Returns:
{bool}
: true if a part is is a message part
Back to module description.
Method mime_part:get_boundary()
Returns boundary for a part (extracted from parent multipart for normal parts and from the part itself for multipart)
Parameters:
No parameters
Returns:
{string}
: boundary value or nil
Back to module description.
Method mime_part:get_enclosing_boundary()
Returns an enclosing boundary for a part even for multiparts. For normal parts
this method is identical to get_boundary
Parameters:
No parameters
Returns:
{string}
: boundary value or nil
Back to module description.
Method mime_part:get_children()
Returns rspamd_mimepart table of part's childer. Returns nil if mime part is not multipart or a message part.
Parameters:
No parameters
Returns:
{rspamd_mimepart}
: table of children
Back to module description.
Method mime_part:is_text()
Returns true if mime part is a text part
Parameters:
No parameters
Returns:
{bool}
: true if a part is a text part
Back to module description.
Method mime_part:get_text()
Returns rspamd_textpart structure associated with this part.
Parameters:
No parameters
Returns:
{rspamd_textpart}
: textpart structure or nil if a part is not an text
Back to module description.
Method mime_part:get_digest()
Returns the unique digest for this mime part
Parameters:
No parameters
Returns:
{string}
: 128 characters hex string with digest of the part
Back to module description.
Method mime_part:get_id()
Returns the order of the part in parts list
Parameters:
No parameters
Returns:
{number}
: index of the part (starting from 1 as it is Lua API)
Back to module description.
Method mime_part:is_broken()
Returns true if mime part has incorrectly specified content type
Parameters:
No parameters
Returns:
{bool}
: true if a part has bad content type
Back to module description.
Method mime_part:headers_foreach(callback, [params])
This method calls callback
for each header that satisfies some condition.
By default, all headers are iterated unless callback
returns true
. Nil or
false means continue of iterations.
Params could be as following:
full
: header value is full table of all attributestask:get_header_full
for detailsregexp
: return headers that satisfies the specified regexp
Parameters:
callback {function}
: function from header name and header valueparams {table}
: optional parameters
Returns:
No return
Back to module description.
Method mime_part:get_parent()
Returns parent part for this part
Parameters:
No parameters
Returns:
{rspamd_mimepart}
: parent part or nil
Back to module description.
Method mime_part:get_specific()
Returns specific lua content for this part
Parameters:
No parameters
Returns:
{any}
: specific lua content
Back to module description.
Method mime_part:set_specific(<any>)
Sets a specific content for this part
Parameters:
No parameters
Returns:
{any}
: previous specific lua content (or nil)
Back to module description.
Method mime_part:is_specific(<any>)
Returns true if part has specific lua content
Parameters:
No parameters
Returns:
{boolean}
: flag
Back to module description.
Method mime_part:get_urls([need_emails|list_protos][, need_images])
Get all URLs found in a mime part. Telephone urls and emails are not included unless explicitly asked in list_protos
Parameters:
need_emails {boolean}
: iftrue
then return also email urls, this can be a comma separated string of protocols desired or a table (e.g.mailto
ortelephone
)need_images {boolean}
: return urls from images () as well
Returns:
{table rspamd_url}
: list of all urls found
Back to module description.
Method mime_part:get_stats()
Returns a table with the following data:
- -
lines
: number of linesspaces
: number of spacesdouble_spaces
: double spacesempty_lines
: number of empty linesnon_ascii_characters
: number of non ascii charactersascii_characters
: number of ascii characters
Parameters:
No parameters
Returns:
{table}
: table of stats
Back to module description.
Back to top.