Module rspamd_util
Module rspamd_util
This module contains some generic purpose utilities that could be useful for testing and production rules.
Brief content:
Functions:
Function | Description |
---|---|
util.create_event_base() | Creates new event base for processing asynchronous events. |
util.load_rspamd_config(filename) | Load rspamd config from the specified file. |
util.config_from_ucl(any, string) | Load rspamd config from ucl represented by any lua table. |
util.encode_base64(input[, str_len, [newlines_type]]) | Encodes data in base64 breaking lines if needed. |
util.encode_qp(input[, str_len, [newlines_type]]) | Encodes data in quoted printable breaking lines if needed. |
util.decode_qp(input) | Decodes data from quoted printable. |
util.decode_base64(input) | Decodes data from base64 ignoring whitespace characters. |
util.encode_base32(input, [b32type = 'default']) | Encodes data in base32 breaking lines if needed. |
util.decode_base32(input, [b32type = 'default']) | Decodes data from base32 ignoring whitespace characters. |
util.decode_url(input) | Decodes data from url encoding. |
util.tokenize_text(input[, exceptions]) | Create tokens from a text using optional exceptions list. |
util.tanh(num) | Calculates hyperbolic tangent of the specified floating point value. |
util.parse_html(input) | Parses HTML and returns the according text. |
util.levenshtein_distance(s1, s2) | Returns levenstein distance between two strings. |
util.fold_header(name, value, [how, [stop_chars]]) | Fold rfc822 header according to the folding rules. |
util.is_uppercase(str) | Returns true if a string is all uppercase. |
util.humanize_number(num) | Returns humanized representation of given number (like 1k instead of 1000). |
util.get_tld(host) | Returns effective second level domain part (eSLD) for the specified host. |
util.glob(pattern) | Returns results for the glob match for the specified pattern. |
util.parse_mail_address(str, [pool]) | Parses email address and returns a table of tables in the following format. |
util.strlen_utf8(str) | Returns length of string encoded in utf-8 in characters. |
util.lower_utf8(str) | Converts utf8 string to lower case. |
util.normalize_utf8(str) | Gets a string in UTF8 and normalises it to NFKC_Casefold form. |
util.transliterate(str) | Converts utf8 encoded string to latin transliteration. |
util.strequal_caseless(str1, str2) | Compares two strings regardless of their case using ascii comparison. |
util.strequal_caseless_utf8(str1, str2) | Compares two utf8 strings regardless of their case using utf8 collation rules. |
util.get_ticks() | Returns current number of ticks as floating point number. |
util.get_time() | Returns current time as unix time in floating point representation. |
util.time_to_string(seconds) | Converts time from Unix time to HTTP date format. |
util.stat(fname) | Performs stat(2) on a specified filepath and returns table of values. |
util.unlink(fname) | Removes the specified file from the filesystem. |
util.lock_file(fname, [fd]) | Lock the specified file. |
util.unlock_file(fd, [close_fd]) | Unlock the specified file closing the file descriptor associated. |
util.create_file(fname, [mode]) | Creates the specified file with the default mode 0644. |
util.close_file(fd) | Closes descriptor fd. |
util.random_hex(size) | Returns random hex string of the specified size. |
util.zstd_compress(data, [level=1]) | Compresses input using zstd compression. |
util.zstd_decompress(data) | Decompresses input using zstd algorithm. |
util.gzip_decompress(data, [size_limit]) | Decompresses input using gzip algorithm. |
util.inflate(data, [size_limit]) | Decompresses input using inflate algorithm. |
util.gzip_compress(data, [level=1]) | Compresses input using gzip compression. |
util.normalize_prob(prob, [bias = 0.5]) | Normalize probabilities using polynom. |
util.is_utf_spoofed(str, [str2]) | Returns true if a string is spoofed (possibly with another string str2 ). |
util.get_string_stats(str) | Returns table with number of letters and digits in string. |
util.is_valid_utf8(str) | Returns true if a string is valid UTF8 string. |
util.has_obscured_unicode(str) | Returns true if a string has obscure UTF symbols (zero width spaces, order marks), ignores invalid utf characters. |
util.readline([prompt]) | Returns string read from stdin with history and editing support. |
util.readpassphrase([prompt]) | Returns string read from stdin disabling echo. |
util.file_exists(file) | Checks if a specified file exists and is available for reading. |
util.mkdir(dir[, recursive]) | Creates a specified directory. |
util.umask(mask) | Sets new umask. |
util.isatty() | Returns if stdout is a tty. |
util.pack(fmt, ...) | . |
util.packsize(fmt) | . |
util.unpack(fmt, s [, pos]) | Unpacks string s according to the format string fmt as described in. |
util.caseless_hash(str[, seed]) | Calculates caseless non-crypto hash from a string or rspamd text. |
util.caseless_hash_fast(str[, seed]) | Calculates caseless non-crypto hash from a string or rspamd text. |
util.get_hostname() | Returns hostname for this machine. |
util.parse_content_type(ct_string, mempool) | Parses content-type string to a table. |
util.mime_header_encode(hdr[, is_structured]) | Encodes header if needed. |
util.btc_polymod(input_values) | Performs bitcoin polymod function. |
util.parse_smtp_date(str[, local_tz]) | Converts an SMTP date string to unix timestamp. |
Functions
The module rspamd_util
defines the following functions.
Function util.create_event_base()
Creates new event base for processing asynchronous events
Parameters:
No parameters
Returns:
{ev_base}
: new event processing base
Back to module description.
Function util.load_rspamd_config(filename)
Load rspamd config from the specified file
Parameters:
No parameters
Returns:
{confg}
: new configuration object suitable for access
Back to module description.
Function util.config_from_ucl(any, string)
Load rspamd config from ucl represented by any lua table
Parameters:
No parameters
Returns:
{confg}
: new configuration object suitable for access
Back to module description.
Function util.encode_base64(input[, str_len, [newlines_type]])
Encodes data in base64 breaking lines if needed
Parameters:
input {text or string}
: input datastr_len {number}
: optional size of lines or 0 if split is not needed
Returns:
{rspamd_text}
: encoded data chunk
Back to module description.
Function util.encode_qp(input[, str_len, [newlines_type]])
Encodes data in quoted printable breaking lines if needed
Parameters:
input {text or string}
: input datastr_len {number}
: optional size of lines or 0 if split is not needed
Returns:
{rspamd_text}
: encoded data chunk
Back to module description.
Function util.decode_qp(input)
Decodes data from quoted printable
Parameters:
input {text or string}
: input data
Returns:
{rspamd_text}
: decoded data chunk
Back to module description.
Function util.decode_base64(input)
Decodes data from base64 ignoring whitespace characters
Parameters:
input {text or string}
: data to decode; ifrspamd{text}
is used then the string is modified in-place
Returns:
{rspamd_text}
: decoded data chunk
Back to module description.
Function util.encode_base32(input, [b32type = 'default'])
Encodes data in base32 breaking lines if needed
Parameters:
input {text or string}
: input datab32type {string}
: base32 type (default, bleach, rfc)
Returns:
{rspamd_text}
: encoded data chunk
Back to module description.
Function util.decode_base32(input, [b32type = 'default'])
Decodes data from base32 ignoring whitespace characters
Parameters:
input {text or string}
: data to decodeb32type {string}
: base32 type (default, bleach, rfc)
Returns:
{rspamd_text}
: decoded data chunk
Back to module description.
Function util.decode_url(input)
Decodes data from url encoding
Parameters:
input {text or string}
: data to decode
Returns:
{rspamd_text}
: decoded data chunk
Back to module description.
Function util.tokenize_text(input[, exceptions])
Create tokens from a text using optional exceptions list
Parameters:
input {text/string}
: input dataexceptions, {table}
: a table of pairs containing <start_pos,length> of exceptions in the input
Returns:
{table/strings}
: list of strings representing words in the text
Back to module description.
Function util.tanh(num)
Calculates hyperbolic tangent of the specified floating point value
Parameters:
num {number}
: input number
Returns:
{number}
: hyperbolic tangent of the variable
Back to module description.
Function util.parse_html(input)
Parses HTML and returns the according text
Parameters:
in {string|text}
: input HTML
Returns:
{rspamd_text}
: processed text with no HTML tags
Back to module description.
Function util.levenshtein_distance(s1, s2)
Returns levenstein distance between two strings
Parameters:
s1 {string}
: the first strings2 {string}
: the second string
Returns:
{number}
: number of differences in two strings
Back to module description.
Function util.fold_header(name, value, [how, [stop_chars]])
Fold rfc822 header according to the folding rules
Parameters:
name {string}
: name of the headervalue {string}
: value of the headerhow {string}
: "cr" for \r, "lf" for \n and "crlf" for \r\n (default)stop_chars {string}
: also fold header when the
Returns:
{string}
: Folded value of the header
Back to module description.
Function util.is_uppercase(str)
Returns true if a string is all uppercase
Parameters:
str {string}
: input string
Returns:
{bool}
: true if a string is all uppercase
Back to module description.
Function util.humanize_number(num)
Returns humanized representation of given number (like 1k instead of 1000)
Parameters:
num {number}
: number to humanize
Returns:
{string}
: humanized representation of a number
Back to module description.
Function util.get_tld(host)
Returns effective second level domain part (eSLD) for the specified host
Parameters:
host {string}
: hostname
Returns:
{string}
: eSLD part of the hostname or the full hostname if eSLD was not found
Back to module description.
Function util.glob(pattern)
Returns results for the glob match for the specified pattern
Parameters:
pattern {string}
: glob pattern to match ('?' and '*' are supported)
Returns:
{table/string}
: list of matched files
Back to module description.
Function util.parse_mail_address(str, [pool])
Parses email address and returns a table of tables in the following format:
raw
- the original value without any processingname
- name of internet address in UTF8, e.g. forVsevolod Stakhov <blah@foo.com>
it returnsVsevolod Stakhov
addr
- address part of the addressuser
- user part (if present) of the address, e.g.blah
domain
- domain part (if present), e.g.foo.com
flags
- table with following keys set to true if given condition fulfilled:- [valid] - valid SMTP address in conformity with https://tools.ietf.org/html/rfc5321#section-4.1.
- [ip] - domain is IPv4/IPv6 address
- [braced] - angled
<blah@foo.com>
address - [quoted] - quoted user part
- [empty] - empty address
- [backslash] - user part contains backslash
- [8bit] - contains 8bit characters
Parameters:
str {string}
: input stringpool {rspamd_mempool}
: memory pool to use
Returns:
{table/tables}
: parsed list of mail addresses
Back to module description.
Function util.strlen_utf8(str)
Returns length of string encoded in utf-8 in characters. If invalid characters are found, then this function returns number of bytes.
Parameters:
str {string}
: utf8 encoded string
Returns:
{number}
: number of characters in string
Back to module description.
Function util.lower_utf8(str)
Converts utf8 string to lower case
Parameters:
str {string}
: utf8 encoded string
Returns:
{string}
: lowercased utf8 string
Back to module description.
Function util.normalize_utf8(str)
Gets a string in UTF8 and normalises it to NFKC_Casefold form RSPAMD_UNICODE_NORM_NORMAL = 0, RSPAMD_UNICODE_NORM_UNNORMAL = (1 << 0), RSPAMD_UNICODE_NORM_ZERO_SPACES = (1 << 1), RSPAMD_UNICODE_NORM_ERROR = (1 << 2), RSPAMD_UNICODE_NORM_OVERFLOW = (1 << 3)
Parameters:
str {string}
: utf8 encoded string
Returns:
{string,integer}
: lowercased utf8 string + result of the normalisation (use bit.band to check):
Back to module description.
Function util.transliterate(str)
Converts utf8 encoded string to latin transliteration
Parameters:
str {string/text}
: utf8 encoded string
Returns:
{text}
: transliterated string
Back to module description.
Function util.strequal_caseless(str1, str2)
Compares two strings regardless of their case using ascii comparison.
Returns true
if str1
is equal to str2
Parameters:
str1 {string}
: utf8 encoded stringstr2 {string}
: utf8 encoded string
Returns:
{bool}
: result of comparison
Back to module description.
Function util.strequal_caseless_utf8(str1, str2)
Compares two utf8 strings regardless of their case using utf8 collation rules.
Returns true
if str1
is equal to str2
Parameters:
str1 {string}
: utf8 encoded stringstr2 {string}
: utf8 encoded string
Returns:
{bool}
: result of comparison
Back to module description.
Function util.get_ticks()
Returns current number of ticks as floating point number
Parameters:
No parameters
Returns:
{number}
: number of current clock ticks (monotonically increasing)
Back to module description.
Function util.get_time()
Returns current time as unix time in floating point representation
Parameters:
No parameters
Returns:
{number}
: number of seconds since 01.01.1970
Back to module description.
Function util.time_to_string(seconds)
Converts time from Unix time to HTTP date format
Parameters:
seconds {number}
: unix timestamp
Returns:
{string}
: date as HTTP date
Back to module description.
Function util.stat(fname)
Performs stat(2) on a specified filepath and returns table of values
size
: size of file in bytestype
: type of filepath:regular
,directory
,special
mtime
: modification time as unix time
Parameters:
No parameters
Returns:
{string,table}
: string is returned when error is occurred
Example:
local err,st = util.stat('/etc/password')
if err then
-- handle error
else
print(st['size'])
end
Back to module description.
Function util.unlink(fname)
Removes the specified file from the filesystem
Parameters:
fname {string}
: filename to remove
Returns:
{boolean,[string]}
: true if file has been deleted or false,'error string'
Back to module description.
Function util.lock_file(fname, [fd])
Lock the specified file. This function returns {number} which must be passed to util.unlock_file
after usage
or you'll have a resource leak
Parameters:
fname {string}
: filename to lockfd {number}
: use the specified fd instead of opening one
Returns:
{number|nil,string}
: number if locking was successful or nil + error otherwise
Back to module description.
Function util.unlock_file(fd, [close_fd])
Unlock the specified file closing the file descriptor associated.
Parameters:
fd {number}
: descriptor to unlockclose_fd {boolean}
: close descriptor on unlocking (default: TRUE)
Returns:
{boolean[,string]}
: true if a file was unlocked
Back to module description.
Function util.create_file(fname, [mode])
Creates the specified file with the default mode 0644
Parameters:
fname {string}
: filename to createmode {number}
: open mode (you should use octal number here)
Returns:
{number|nil,string}
: file descriptor or pair nil + error string
Back to module description.
Function util.close_file(fd)
Closes descriptor fd
Parameters:
fd {number}
: descriptor to close
Returns:
{boolean[,string]}
: true if a file was closed
Back to module description.
Function util.random_hex(size)
Returns random hex string of the specified size
Parameters:
len {number}
: length of desired string in bytes
Returns:
{string}
: string with random hex digests
Back to module description.
Function util.zstd_compress(data, [level=1])
Compresses input using zstd compression
Parameters:
data {string/rspamd_text}
: input data
Returns:
{rspamd_text}
: compressed data
Back to module description.
Function util.zstd_decompress(data)
Decompresses input using zstd algorithm
Parameters:
data {string/rspamd_text}
: compressed data
Returns:
{error,rspamd_text}
: pair of error + decompressed text
Back to module description.
Function util.gzip_decompress(data, [size_limit])
Decompresses input using gzip algorithm
Parameters:
data {string/rspamd_text}
: compressed datasize_limit {integer}
: optional size limit
Returns:
{rspamd_text}
: decompressed text
Back to module description.
Function util.inflate(data, [size_limit])
Decompresses input using inflate algorithm
Parameters:
data {string/rspamd_text}
: compressed datasize_limit {integer}
: optional size limit
Returns:
{rspamd_text}
: decompressed text
Back to module description.
Function util.gzip_compress(data, [level=1])
Compresses input using gzip compression
Parameters:
data {string/rspamd_text}
: input data
Returns:
{rspamd_text}
: compressed data
Back to module description.
Function util.normalize_prob(prob, [bias = 0.5])
Normalize probabilities using polynom
Parameters:
prob {number}
: probability parambias {number}
: number to subtract for making the final solution
Returns:
{number}
: normalized number
Back to module description.
Function util.is_utf_spoofed(str, [str2])
Returns true if a string is spoofed (possibly with another string str2
)
Parameters:
No parameters
Returns:
{boolean}
: true if a string is spoofed
Back to module description.
Function util.get_string_stats(str)
Returns table with number of letters and digits in string
Parameters:
No parameters
Returns:
{table}
: with string stats keys are "digits" and "letters"
Back to module description.
Function util.is_valid_utf8(str)
Returns true if a string is valid UTF8 string
Parameters:
No parameters
Returns:
{boolean}
: true if a string is spoofed
Back to module description.
Function util.has_obscured_unicode(str)
Returns true if a string has obscure UTF symbols (zero width spaces, order marks), ignores invalid utf characters
Parameters:
No parameters
Returns:
{boolean}
: true if a has obscured unicode characters (+ character and offset if found)
Back to module description.
Function util.readline([prompt])
Returns string read from stdin with history and editing support
Parameters:
No parameters
Returns:
{string}
: string read from the input (with line endings stripped)
Back to module description.
Function util.readpassphrase([prompt])
Returns string read from stdin disabling echo
Parameters:
No parameters
Returns:
{string}
: string read from the input (with line endings stripped)
Back to module description.
Function util.file_exists(file)
Checks if a specified file exists and is available for reading
Parameters:
No parameters
Returns:
{boolean,string}
: true if file exists + string error if not
Back to module description.
Function util.mkdir(dir[, recursive])
Creates a specified directory
Parameters:
No parameters
Returns:
{boolean[,error]}
: true if directory has been created
Back to module description.
Function util.umask(mask)
Sets new umask. Accepts either numeric octal string, e.g. '022' or a plain number, e.g. 0x12 (since Lua does not support octal integrals)
Parameters:
No parameters
Returns:
{number}
: old umask
Back to module description.
Function util.isatty()
Returns if stdout is a tty
Parameters:
No parameters
Returns:
{boolean}
: true in case of output being tty
Back to module description.
Function util.pack(fmt, ...)
Backport of Lua 5.3 string.pack
function:
Returns a binary string containing the values v1, v2, etc. packed (that is,
serialized in binary form) according to the format string fmt
A format string is a sequence of conversion options. The conversion
options are as follows:
- <: sets little endian
-
: sets big endian
- =: sets native endian
- ![n]: sets maximum alignment to n (default is native alignment)
- b: a signed byte (char)
- B: an unsigned byte (char)
- h: a signed short (native size)
- H: an unsigned short (native size)
- l: a signed long (native size)
- L: an unsigned long (native size)
- j: a lua_Integer
- J: a lua_Unsigned
- T: a size_t (native size)
- i[n]: a signed int with n bytes (default is native size)
- I[n]: an unsigned int with n bytes (default is native size)
- f: a float (native size)
- d: a double (native size)
- n: a lua_Number
- cn: a fixed-sized string with n bytes
- z: a zero-terminated string
- s[n]: a string preceded by its length coded as an unsigned integer with
- n bytes (default is a size_t)
- x: one byte of padding
- Xop: an empty item that aligns according to option op (which is otherwise ignored)
- ' ': (empty space) ignored
(A "[n]" means an optional integral numeral.) Except for padding, spaces, and configurations (options "xX <=>!"), each option corresponds to an argument (in string.pack) or a result (in string.unpack).
For options "!n", "sn", "in", and "In", n can be any integer between 1 and All integral options check overflows; string.pack checks whether the given value fits in the given size; string.unpack checks whether the read value fits in a Lua integer.
Any format string starts as if prefixed by "!1=", that is, with maximum alignment of 1 (no alignment) and native endianness.
Alignment works as follows: For each option, the format gets extra padding until the data starts at an offset that is a multiple of the minimum between the option size and the maximum alignment; this minimum must be a power of 2. Options "c" and "z" are not aligned; option "s" follows the alignment of its starting integer.
All padding is filled with zeros by string.pack (and ignored by unpack).
Parameters:
No parameters
Returns:
No return
Back to module description.
Function util.packsize(fmt)
Returns size of the packed binary string returned for the same fmt
argument
by util.pack
Parameters:
No parameters
Returns:
No return
Back to module description.
Function util.unpack(fmt, s [, pos])
Unpacks string s
according to the format string fmt
as described in
util.pack
Parameters:
No parameters
Returns:
- s {multiple} list of unpacked values according to
fmt
Back to module description.
Function util.caseless_hash(str[, seed])
Calculates caseless non-crypto hash from a string or rspamd text
Parameters:
str {no type}
: string or lua_textseed {no type}
: mandatory seed (0xdeadbabe by default)
Returns:
{int64}
: boxed int64_t
Back to module description.
Function util.caseless_hash_fast(str[, seed])
Calculates caseless non-crypto hash from a string or rspamd text
Parameters:
str {no type}
: string or lua_textseed {no type}
: mandatory seed (0xdeadbabe by default)
Returns:
{number}
: number from int64_t
Back to module description.
Function util.get_hostname()
Returns hostname for this machine
Parameters:
No parameters
Returns:
{string}
: hostname
Back to module description.
Function util.parse_content_type(ct_string, mempool)
Parses content-type string to a table:
type
subtype
charset
boundary
- other attributes
Parameters:
ct_string {string}
: content type as stringmempool {rspamd_mempool}
: needed to store temporary data (e.g. task pool)
Returns:
- table or nil if cannot parse content type
Back to module description.
Function util.mime_header_encode(hdr[, is_structured])
Encodes header if needed
Parameters:
hdr {string}
: input headeris_structured {boolean}
: if true, then we encode as structured header (e.g. encode all non alpha-numeric characters)
Returns:
- encoded header
Back to module description.
Function util.btc_polymod(input_values)
Performs bitcoin polymod function
Parameters:
input_values {table|numbers}
: no description
Returns:
{boolean}
: true if polymod has been successful
Back to module description.
Function util.parse_smtp_date(str[, local_tz])
Converts an SMTP date string to unix timestamp
Parameters:
str {string}
: input stringlocal_tz {boolean}
: convert to local tz iftrue
Returns:
{number}
: time as unix timestamp (converted to float)
Back to module description.
Back to top.