Module rspamd_parsers
Module rspamd_parsers
This module contains Lua-C interfaces to Rspamd parsers of different kind.
Brief content:
Functions:
| Function | Description | 
|---|---|
parsers.tokenize_text(input[, exceptions]) | Create tokens from a text using optional exceptions list. | 
parsers.parse_html(input) | Parses HTML and returns the according text. | 
parsers.parse_html_content(input, mempool) | Parses HTML and returns the HTML content object for structure analysis. | 
parsers.parse_mail_address(str, [pool]) | Parses email address and returns a table of tables in the following format. | 
parsers.parse_content_type(ct_string, mempool) | Parses content-type string to a table. | 
parsers.parse_smtp_date(str[, local_tz]) | Converts an SMTP date string to unix timestamp. | 
Functions
The module rspamd_parsers defines the following functions.
Function parsers.tokenize_text(input[, exceptions])
Create tokens from a text using optional exceptions list
Parameters:
input {text/string}: input dataexceptions, {table}: a table of pairs containing <start_pos,length> of exceptions in the input
Returns:
{table/strings}: list of strings representing words in the text
Back to module description.
Function parsers.parse_html(input)
Parses HTML and returns the according text
Parameters:
in {string|text}: input HTML
Returns:
{rspamd_text}: processed text with no HTML tags
Back to module description.
Function parsers.parse_html_content(input, mempool)
Parses HTML and returns the HTML content object for structure analysis
Parameters:
in {string|text}: input HTMLmempool {rspamd_mempool}: memory pool for HTML content management
Returns:
{html_content}: HTML content object with tag structure
Back to module description.
Function parsers.parse_mail_address(str, [pool])
Parses email address and returns a table of tables in the following format:
raw- the original value without any processingname- name of internet address in UTF8, e.g. forVsevolod Stakhov <blah@foo.com>it returnsVsevolod Stakhovaddr- address part of the addressuser- user part (if present) of the address, e.g.blahdomain- domain part (if present), e.g.foo.comflags- table with following keys set to true if given condition fulfilled:- [valid] - valid SMTP address in conformity with https://tools.ietf.org/html/rfc5321#section-4.1.
 - [ip] - domain is IPv4/IPv6 address
 - [braced] - angled 
<blah@foo.com>address - [quoted] - quoted user part
 - [empty] - empty address
 - [backslash] - user part contains backslash
 - [8bit] - contains 8bit characters
 
Parameters:
str {string}: input stringpool {rspamd_mempool}: memory pool to use
Returns:
{table/tables}: parsed list of mail addresses
Back to module description.
Function parsers.parse_content_type(ct_string, mempool)
Parses content-type string to a table:
typesubtypecharsetboundary- other attributes
 
Parameters:
ct_string {string}: content type as stringmempool {rspamd_mempool}: needed to store temporary data (e.g. task pool)
Returns:
- table or nil if cannot parse content type
 
Back to module description.
Function parsers.parse_smtp_date(str[, local_tz])
Converts an SMTP date string to unix timestamp
Parameters:
str {string}: input stringlocal_tz {boolean}: convert to local tz iftrue
Returns:
{number}: time as unix timestamp (converted to float)
Back to module description.
Back to top.