Module rspamd_parsers
Module rspamd_parsers
This module contains Lua-C interfaces to Rspamd parsers of different kind.
Brief content:
Functions:
Function | Description |
---|---|
parsers.tokenize_text(input[, exceptions]) | Create tokens from a text using optional exceptions list. |
parsers.parse_html(input) | Parses HTML and returns the according text. |
parsers.parse_mail_address(str, [pool]) | Parses email address and returns a table of tables in the following format. |
parsers.parse_content_type(ct_string, mempool) | Parses content-type string to a table. |
parsers.parse_smtp_date(str[, local_tz]) | Converts an SMTP date string to unix timestamp. |
Functions
The module rspamd_parsers
defines the following functions.
Function parsers.tokenize_text(input[, exceptions])
Create tokens from a text using optional exceptions list
Parameters:
input {text/string}
: input dataexceptions, {table}
: a table of pairs containing <start_pos,length> of exceptions in the input
Returns:
{table/strings}
: list of strings representing words in the text
Back to module description.
Function parsers.parse_html(input)
Parses HTML and returns the according text
Parameters:
in {string|text}
: input HTML
Returns:
{rspamd_text}
: processed text with no HTML tags
Back to module description.
Function parsers.parse_mail_address(str, [pool])
Parses email address and returns a table of tables in the following format:
raw
- the original value without any processingname
- name of internet address in UTF8, e.g. forVsevolod Stakhov <blah@foo.com>
it returnsVsevolod Stakhov
addr
- address part of the addressuser
- user part (if present) of the address, e.g.blah
domain
- domain part (if present), e.g.foo.com
flags
- table with following keys set to true if given condition fulfilled:- [valid] - valid SMTP address in conformity with https://tools.ietf.org/html/rfc5321#section-4.1.
- [ip] - domain is IPv4/IPv6 address
- [braced] - angled
<blah@foo.com>
address - [quoted] - quoted user part
- [empty] - empty address
- [backslash] - user part contains backslash
- [8bit] - contains 8bit characters
Parameters:
str {string}
: input stringpool {rspamd_mempool}
: memory pool to use
Returns:
{table/tables}
: parsed list of mail addresses
Back to module description.
Function parsers.parse_content_type(ct_string, mempool)
Parses content-type string to a table:
type
subtype
charset
boundary
- other attributes
Parameters:
ct_string {string}
: content type as stringmempool {rspamd_mempool}
: needed to store temporary data (e.g. task pool)
Returns:
- table or nil if cannot parse content type
Back to module description.
Function parsers.parse_smtp_date(str[, local_tz])
Converts an SMTP date string to unix timestamp
Parameters:
str {string}
: input stringlocal_tz {boolean}
: convert to local tz iftrue
Returns:
{number}
: time as unix timestamp (converted to float)
Back to module description.
Back to top.