Module rspamd_util

Module `rspamd_util`

This module contains some generic purpose utilities that could be useful for testing and production rules.

Brief content:

Functions:

Function	Description
`util.create_event_base()`	Creates new event base for processing asynchronous events.
`util.load_rspamd_config(filename)`	Load rspamd config from the specified file.
`util.config_from_ucl(any, string)`	Load rspamd config from ucl represented by any lua table.
`util.encode_base64(input[, str_len, [newlines_type]])`	Encodes data in base64 breaking lines if needed.
`util.encode_qp(input[, str_len, [newlines_type]])`	Encodes data in quoted printable breaking lines if needed.
`util.decode_qp(input)`	Decodes data from quoted printable.
`util.decode_base64(input)`	Decodes data from base64 ignoring whitespace characters.
`util.encode_base32(input, [b32type = 'default'])`	Encodes data in base32 breaking lines if needed.
`util.decode_base32(input, [b32type = 'default'])`	Decodes data from base32 ignoring whitespace characters.
`util.decode_url(input)`	Decodes data from url encoding.
`util.tokenize_text(input[, exceptions])`	Create tokens from a text using optional exceptions list.
`util.tanh(num)`	Calculates hyperbolic tangent of the specified floating point value.
`util.parse_html(input)`	Parses HTML and returns the according text.
`util.levenshtein_distance(s1, s2)`	Returns levenstein distance between two strings.
`util.fold_header(name, value, [how, [stop_chars]])`	Fold rfc822 header according to the folding rules.
`util.is_uppercase(str)`	Returns true if a string is all uppercase.
`util.humanize_number(num)`	Returns humanized representation of given number (like 1k instead of 1000).
`util.get_tld(host)`	Returns effective second level domain part (eSLD) for the specified host.
`util.glob(pattern)`	Returns results for the glob match for the specified pattern.
`util.parse_mail_address(str, [pool])`	Parses email address and returns a table of tables in the following format.
`util.strlen_utf8(str)`	Returns length of string encoded in utf-8 in characters.
`util.lower_utf8(str)`	Converts utf8 string to lower case.
`util.normalize_utf8(str)`	Gets a string in UTF8 and normalises it to NFKC_Casefold form.
`util.transliterate(str)`	Converts utf8 encoded string to latin transliteration.
`util.strequal_caseless(str1, str2)`	Compares two strings regardless of their case using ascii comparison.
`util.strequal_caseless_utf8(str1, str2)`	Compares two utf8 strings regardless of their case using utf8 collation rules.
`util.get_ticks()`	Returns current number of ticks as floating point number.
`util.get_time()`	Returns current time as unix time in floating point representation.
`util.time_to_string(seconds)`	Converts time from Unix time to HTTP date format.
`util.stat(fname)`	Performs stat(2) on a specified filepath and returns table of values.
`util.unlink(fname)`	Removes the specified file from the filesystem.
`util.lock_file(fname, [fd])`	Lock the specified file.
`util.unlock_file(fd, [close_fd])`	Unlock the specified file closing the file descriptor associated.
`util.create_file(fname, [mode])`	Creates the specified file with the default mode 0644.
`util.close_file(fd)`	Closes descriptor fd.
`util.random_hex(size)`	Returns random hex string of the specified size.
`util.zstd_compress(data, [level=1])`	Compresses input using zstd compression.
`util.zstd_decompress(data)`	Decompresses input using zstd algorithm.
`util.gzip_decompress(data, [size_limit])`	Decompresses input using gzip algorithm.
`util.inflate(data, [size_limit])`	Decompresses input using inflate algorithm.
`util.gzip_compress(data, [level=1])`	Compresses input using gzip compression.
`util.normalize_prob(prob, [bias = 0.5])`	Normalize probabilities using polynom.
`util.is_utf_spoofed(str, [str2])`	Returns true if a string is spoofed (possibly with another string `str2`).
`util.get_string_stats(str)`	Returns table with number of letters and digits in string.
`util.is_valid_utf8(str)`	Returns true if a string is valid UTF8 string.
`util.has_obscured_unicode(str)`	Returns true if a string has obscure UTF symbols (zero width spaces, order marks), ignores invalid utf characters.
`util.readline([prompt])`	Returns string read from stdin with history and editing support.
`util.readpassphrase([prompt])`	Returns string read from stdin disabling echo.
`util.file_exists(file)`	Checks if a specified file exists and is available for reading.
`util.mkdir(dir[, recursive])`	Creates a specified directory.
`util.umask(mask)`	Sets new umask.
`util.isatty()`	Returns if stdout is a tty.
`util.pack(fmt, ...)`	.
`util.packsize(fmt)`	.
`util.unpack(fmt, s [, pos])`	Unpacks string `s` according to the format string `fmt` as described in.
`util.caseless_hash(str[, seed])`	Calculates caseless non-crypto hash from a string or rspamd text.
`util.caseless_hash_fast(str[, seed])`	Calculates caseless non-crypto hash from a string or rspamd text.
`util.get_hostname()`	Returns hostname for this machine.
`util.get_uptime()`	Returns system uptime in seconds.
`util.get_pid()`	Returns current process PID.
`util.get_memory_usage()`	Returns memory usage information for current process.
`util.parse_content_type(ct_string, mempool)`	Parses content-type string to a table.
`util.mime_header_encode(hdr[, is_structured])`	Encodes header if needed.
`util.btc_polymod(input_values)`	Performs bitcoin polymod function.
`util.parse_smtp_date(str[, local_tz])`	Converts an SMTP date string to unix timestamp.

Functions

The module rspamd_util defines the following functions.

Function `util.create_event_base()`

Creates new event base for processing asynchronous events

Parameters:

No parameters

Returns:

{ev_base}: new event processing base

Back to module description.

Function `util.load_rspamd_config(filename)`

Load rspamd config from the specified file

Parameters:

No parameters

Returns:

{confg}: new configuration object suitable for access

Back to module description.

Function `util.config_from_ucl(any, string)`

Load rspamd config from ucl represented by any lua table

Parameters:

No parameters

Returns:

{confg}: new configuration object suitable for access

Back to module description.

Function `util.encode_base64(input[, str_len, [newlines_type]])`

Encodes data in base64 breaking lines if needed

Parameters:

input {text or string}: input data
str_len {number}: optional size of lines or 0 if split is not needed

Returns:

{rspamd_text}: encoded data chunk

Back to module description.

Function `util.encode_qp(input[, str_len, [newlines_type]])`

Encodes data in quoted printable breaking lines if needed

Parameters:

input {text or string}: input data
str_len {number}: optional size of lines or 0 if split is not needed

Returns:

{rspamd_text}: encoded data chunk

Back to module description.

Function `util.decode_qp(input)`

Decodes data from quoted printable

Parameters:

input {text or string}: input data

Returns:

{rspamd_text}: decoded data chunk

Back to module description.

Function `util.decode_base64(input)`

Decodes data from base64 ignoring whitespace characters

Parameters:

input {text or string}: data to decode; if rspamd{text} is used then the string is modified in-place

Returns:

{rspamd_text}: decoded data chunk

Back to module description.

Function `util.encode_base32(input, [b32type = 'default'])`

Encodes data in base32 breaking lines if needed

Parameters:

input {text or string}: input data
b32type {string}: base32 type (default, bleach, rfc)

Returns:

{rspamd_text}: encoded data chunk

Back to module description.

Function `util.decode_base32(input, [b32type = 'default'])`

Decodes data from base32 ignoring whitespace characters

Parameters:

input {text or string}: data to decode
b32type {string}: base32 type (default, bleach, rfc)

Returns:

{rspamd_text}: decoded data chunk

Back to module description.

Function `util.decode_url(input)`

Decodes data from url encoding

Parameters:

input {text or string}: data to decode

Returns:

{rspamd_text}: decoded data chunk

Back to module description.

Function `util.tokenize_text(input[, exceptions])`

Create tokens from a text using optional exceptions list

Parameters:

input {text/string}: input data
exceptions, {table}: a table of pairs containing <start_pos,length> of exceptions in the input

Returns:

{table/strings}: list of strings representing words in the text

Back to module description.

Function `util.tanh(num)`

Calculates hyperbolic tangent of the specified floating point value

Parameters:

num {number}: input number

Returns:

{number}: hyperbolic tangent of the variable

Back to module description.

Function `util.parse_html(input)`

Parses HTML and returns the according text

Parameters:

in {string|text}: input HTML

Returns:

{rspamd_text}: processed text with no HTML tags

Back to module description.

Function `util.levenshtein_distance(s1, s2)`

Returns levenstein distance between two strings

Parameters:

s1 {string}: the first string
s2 {string}: the second string

Returns:

{number}: number of differences in two strings

Back to module description.

Function `util.fold_header(name, value, [how, [stop_chars]])`

Fold rfc822 header according to the folding rules

Parameters:

name {string}: name of the header
value {string}: value of the header
how {string}: "cr" for \r, "lf" for \n and "crlf" for \r\n (default)
stop_chars {string}: also fold header when the

Returns:

{string}: Folded value of the header

Back to module description.

Function `util.is_uppercase(str)`

Returns true if a string is all uppercase

Parameters:

str {string}: input string

Returns:

{bool}: true if a string is all uppercase

Back to module description.

Function `util.humanize_number(num)`

Returns humanized representation of given number (like 1k instead of 1000)

Parameters:

num {number}: number to humanize

Returns:

{string}: humanized representation of a number

Back to module description.

Function `util.get_tld(host)`

Returns effective second level domain part (eSLD) for the specified host

Parameters:

host {string}: hostname

Returns:

{string}: eSLD part of the hostname or the full hostname if eSLD was not found

Back to module description.

Function `util.glob(pattern)`

Returns results for the glob match for the specified pattern

Parameters:

pattern {string}: glob pattern to match ('?' and '*' are supported)

Returns:

{table/string}: list of matched files

Back to module description.

Function `util.parse_mail_address(str, [pool])`

Parses email address and returns a table of tables in the following format:

raw - the original value without any processing
name - name of internet address in UTF8, e.g. for Vsevolod Stakhov <blah@foo.com> it returns Vsevolod Stakhov
addr - address part of the address
user - user part (if present) of the address, e.g. blah
domain - domain part (if present), e.g. foo.com
flags - table with following keys set to true if given condition fulfilled:
- [valid] - valid SMTP address in conformity with https://tools.ietf.org/html/rfc5321#section-4.1.
- [ip] - domain is IPv4/IPv6 address
- [braced] - angled <blah@foo.com> address
- [quoted] - quoted user part
- [empty] - empty address
- [backslash] - user part contains backslash
- [8bit] - contains 8bit characters

Parameters:

str {string}: input string
pool {rspamd_mempool}: memory pool to use

Returns:

{table/tables}: parsed list of mail addresses

Back to module description.

Function `util.strlen_utf8(str)`

Returns length of string encoded in utf-8 in characters. If invalid characters are found, then this function returns number of bytes.

Parameters:

str {string}: utf8 encoded string

Returns:

{number}: number of characters in string

Back to module description.

Function `util.lower_utf8(str)`

Converts utf8 string to lower case

Parameters:

str {string}: utf8 encoded string

Returns:

{string}: lowercased utf8 string

Back to module description.

Function `util.normalize_utf8(str)`

Gets a string in UTF8 and normalises it to NFKC_Casefold form RSPAMD_UNICODE_NORM_NORMAL = 0, RSPAMD_UNICODE_NORM_UNNORMAL = (1 << 0), RSPAMD_UNICODE_NORM_ZERO_SPACES = (1 << 1), RSPAMD_UNICODE_NORM_ERROR = (1 << 2), RSPAMD_UNICODE_NORM_OVERFLOW = (1 << 3)

Parameters:

str {string}: utf8 encoded string

Returns:

{string,integer}: lowercased utf8 string + result of the normalisation (use bit.band to check):

Back to module description.

Function `util.transliterate(str)`

Converts utf8 encoded string to latin transliteration

Parameters:

str {string/text}: utf8 encoded string

Returns:

{text}: transliterated string

Back to module description.

Function `util.strequal_caseless(str1, str2)`

Compares two strings regardless of their case using ascii comparison. Returns true if str1 is equal to str2

Parameters:

str1 {string}: utf8 encoded string
str2 {string}: utf8 encoded string

Returns:

{bool}: result of comparison

Back to module description.

Function `util.strequal_caseless_utf8(str1, str2)`

Compares two utf8 strings regardless of their case using utf8 collation rules. Returns true if str1 is equal to str2

Parameters:

str1 {string}: utf8 encoded string
str2 {string}: utf8 encoded string

Returns:

{bool}: result of comparison

Back to module description.

Function `util.get_ticks()`

Returns current number of ticks as floating point number

Parameters:

No parameters

Returns:

{number}: number of current clock ticks (monotonically increasing)

Back to module description.

Function `util.get_time()`

Returns current time as unix time in floating point representation

Parameters:

No parameters

Returns:

{number}: number of seconds since 01.01.1970

Back to module description.

Function `util.time_to_string(seconds)`

Converts time from Unix time to HTTP date format

Parameters:

seconds {number}: unix timestamp

Returns:

{string}: date as HTTP date

Back to module description.

Function `util.stat(fname)`

Performs stat(2) on a specified filepath and returns table of values

size: size of file in bytes
type: type of filepath: regular, directory, special
mtime: modification time as unix time

Parameters:

No parameters

Returns:

{string,table}: string is returned when error is occurred

Example:

local err,st = util.stat('/etc/password')

if err then
  -- handle error
else
  print(st['size'])
end

Back to module description.

Function `util.unlink(fname)`

Removes the specified file from the filesystem

Parameters:

fname {string}: filename to remove

Returns:

{boolean,[string]}: true if file has been deleted or false,'error string'

Back to module description.

Function `util.lock_file(fname, [fd])`

Lock the specified file. This function returns {number} which must be passed to util.unlock_file after usage or you'll have a resource leak

Parameters:

fname {string}: filename to lock
fd {number}: use the specified fd instead of opening one

Returns:

{number|nil,string}: number if locking was successful or nil + error otherwise

Back to module description.

Function `util.unlock_file(fd, [close_fd])`

Unlock the specified file closing the file descriptor associated.

Parameters:

fd {number}: descriptor to unlock
close_fd {boolean}: close descriptor on unlocking (default: TRUE)

Returns:

{boolean[,string]}: true if a file was unlocked

Back to module description.

Function `util.create_file(fname, [mode])`

Creates the specified file with the default mode 0644

Parameters:

fname {string}: filename to create
mode {number}: open mode (you should use octal number here)

Returns:

{number|nil,string}: file descriptor or pair nil + error string

Back to module description.

Function `util.close_file(fd)`

Closes descriptor fd

Parameters:

fd {number}: descriptor to close

Returns:

{boolean[,string]}: true if a file was closed

Back to module description.

Function `util.random_hex(size)`

Returns random hex string of the specified size

Parameters:

len {number}: length of desired string in bytes

Returns:

{string}: string with random hex digests

Back to module description.

Function `util.zstd_compress(data, [level=1])`

Compresses input using zstd compression

Parameters:

data {string/rspamd_text}: input data

Returns:

{rspamd_text}: compressed data

Back to module description.

Function `util.zstd_decompress(data)`

Decompresses input using zstd algorithm

Parameters:

data {string/rspamd_text}: compressed data

Returns:

{error,rspamd_text}: pair of error + decompressed text

Back to module description.

Function `util.gzip_decompress(data, [size_limit])`

Decompresses input using gzip algorithm

Parameters:

data {string/rspamd_text}: compressed data
size_limit {integer}: optional size limit

Returns:

{rspamd_text}: decompressed text

Back to module description.

Function `util.inflate(data, [size_limit])`

Decompresses input using inflate algorithm

Parameters:

data {string/rspamd_text}: compressed data
size_limit {integer}: optional size limit

Returns:

{rspamd_text}: decompressed text

Back to module description.

Function `util.gzip_compress(data, [level=1])`

Compresses input using gzip compression

Parameters:

data {string/rspamd_text}: input data

Returns:

{rspamd_text}: compressed data

Back to module description.

Function `util.normalize_prob(prob, [bias = 0.5])`

Normalize probabilities using polynom

Parameters:

prob {number}: probability param
bias {number}: number to subtract for making the final solution

Returns:

{number}: normalized number

Back to module description.

Function `util.is_utf_spoofed(str, [str2])`

Returns true if a string is spoofed (possibly with another string str2)

Parameters:

No parameters

Returns:

{boolean}: true if a string is spoofed

Back to module description.

Function `util.get_string_stats(str)`

Returns table with number of letters and digits in string

Parameters:

No parameters

Returns:

{table}: with string stats keys are "digits" and "letters"

Back to module description.

Function `util.is_valid_utf8(str)`

Returns true if a string is valid UTF8 string

Parameters:

No parameters

Returns:

{boolean}: true if a string is spoofed

Back to module description.

Function `util.has_obscured_unicode(str)`

Returns true if a string has obscure UTF symbols (zero width spaces, order marks), ignores invalid utf characters

Parameters:

No parameters

Returns:

{boolean}: true if a has obscured unicode characters (+ character and offset if found)

Back to module description.

Function `util.readline([prompt])`

Returns string read from stdin with history and editing support

Parameters:

No parameters

Returns:

{string}: string read from the input (with line endings stripped)

Back to module description.

Function `util.readpassphrase([prompt])`

Returns string read from stdin disabling echo

Parameters:

No parameters

Returns:

{string}: string read from the input (with line endings stripped)

Back to module description.

Function `util.file_exists(file)`

Checks if a specified file exists and is available for reading

Parameters:

No parameters

Returns:

{boolean,string}: true if file exists + string error if not

Back to module description.

Function `util.mkdir(dir[, recursive])`

Creates a specified directory

Parameters:

No parameters

Returns:

{boolean[,error]}: true if directory has been created

Back to module description.

Function `util.umask(mask)`

Sets new umask. Accepts either numeric octal string, e.g. '022' or a plain number, e.g. 0x12 (since Lua does not support octal integrals)

Parameters:

No parameters

Returns:

{number}: old umask

Back to module description.

Function `util.isatty()`

Returns if stdout is a tty

Parameters:

No parameters

Returns:

{boolean}: true in case of output being tty

Back to module description.

Function `util.pack(fmt, ...)`

Backport of Lua 5.3 string.pack function: Returns a binary string containing the values v1, v2, etc. packed (that is, serialized in binary form) according to the format string fmt A format string is a sequence of conversion options. The conversion options are as follows:

<: sets little endian
: sets big endian
=: sets native endian
![n]: sets maximum alignment to n (default is native alignment)
b: a signed byte (char)
B: an unsigned byte (char)
h: a signed short (native size)
H: an unsigned short (native size)
l: a signed long (native size)
L: an unsigned long (native size)
j: a lua_Integer
J: a lua_Unsigned
T: a size_t (native size)
i[n]: a signed int with n bytes (default is native size)
I[n]: an unsigned int with n bytes (default is native size)
f: a float (native size)
d: a double (native size)
n: a lua_Number
cn: a fixed-sized string with n bytes
z: a zero-terminated string
s[n]: a string preceded by its length coded as an unsigned integer with
n bytes (default is a size_t)
x: one byte of padding
Xop: an empty item that aligns according to option op (which is otherwise ignored)
' ': (empty space) ignored

(A "[n]" means an optional integral numeral.) Except for padding, spaces, and configurations (options "xX <=>!"), each option corresponds to an argument (in string.pack) or a result (in string.unpack).

For options "!n", "sn", "in", and "In", n can be any integer between 1 and All integral options check overflows; string.pack checks whether the given value fits in the given size; string.unpack checks whether the read value fits in a Lua integer.

Any format string starts as if prefixed by "!1=", that is, with maximum alignment of 1 (no alignment) and native endianness.

Alignment works as follows: For each option, the format gets extra padding until the data starts at an offset that is a multiple of the minimum between the option size and the maximum alignment; this minimum must be a power of 2. Options "c" and "z" are not aligned; option "s" follows the alignment of its starting integer.

All padding is filled with zeros by string.pack (and ignored by unpack).

Parameters:

No parameters

Returns:

No return

Back to module description.

Function `util.packsize(fmt)`

Returns size of the packed binary string returned for the same fmt argument by util.pack

Parameters:

No parameters

Returns:

No return

Back to module description.

Function `util.unpack(fmt, s [, pos])`

Unpacks string s according to the format string fmt as described in util.pack

Parameters:

No parameters

Returns:

s {multiple} list of unpacked values according to fmt

Back to module description.

Function `util.caseless_hash(str[, seed])`

Calculates caseless non-crypto hash from a string or rspamd text

Parameters:

str {no type}: string or lua_text
seed {no type}: mandatory seed (0xdeadbabe by default)

Returns:

{int64}: boxed int64_t

Back to module description.

Function `util.caseless_hash_fast(str[, seed])`

Calculates caseless non-crypto hash from a string or rspamd text

Parameters:

str {no type}: string or lua_text
seed {no type}: mandatory seed (0xdeadbabe by default)

Returns:

{number}: number from int64_t

Back to module description.

Function `util.get_hostname()`

Returns hostname for this machine

Parameters:

No parameters

Returns:

{string}: hostname

Back to module description.

Function `util.get_uptime()`

Returns system uptime in seconds

Parameters:

No parameters

Returns:

{number}: uptime in seconds

Back to module description.

Function `util.get_pid()`

Returns current process PID

Parameters:

No parameters

Returns:

{number}: process ID

Back to module description.

Function `util.get_memory_usage()`

Returns memory usage information for current process

Parameters:

No parameters

Returns:

{table}: memory usage info with 'rss' and 'vsize' fields in bytes

Back to module description.

Function `util.parse_content_type(ct_string, mempool)`

Parses content-type string to a table:

type
subtype
charset
boundary
other attributes

Parameters:

ct_string {string}: content type as string
mempool {rspamd_mempool}: needed to store temporary data (e.g. task pool)

Returns:

table or nil if cannot parse content type

Back to module description.

Function `util.mime_header_encode(hdr[, is_structured])`

Encodes header if needed

Parameters:

hdr {string}: input header
is_structured {boolean}: if true, then we encode as structured header (e.g. encode all non alpha-numeric characters)

Returns:

encoded header

Back to module description.

Function `util.btc_polymod(input_values)`

Performs bitcoin polymod function

Parameters:

input_values {table|numbers}: no description

Returns:

{boolean}: true if polymod has been successful

Back to module description.

Function `util.parse_smtp_date(str[, local_tz])`

Converts an SMTP date string to unix timestamp

Parameters:

str {string}: input string
local_tz {boolean}: convert to local tz if true

Returns:

{number}: time as unix timestamp (converted to float)

Back to module description.

Module rspamd_util
- Brief content:
Functions