Module rspamd_trie
Module rspamd_trie
Rspamd trie module provides the data structure suitable for searching of many
patterns in arbitrary texts (or binary chunks). The algorithmic complexity of
this algorithm is at most O(n + m + z), where n is the length of text, m is a length of pattern and z is a number of patterns in the text.
Here is a typical example of trie usage:
Example:
local rspamd_trie = require "rspamd_trie"
local patterns = {'aab', 'ab', 'bcd\0ef'}
local trie = rspamd_trie.create(patterns)
local function trie_callback(number, pos)
print('Matched pattern number ' .. tostring(number) .. ' at pos: ' .. tostring(pos))
end
trie:match('some big text', trie_callback)
Brief content:
Methods:
| Method | Description |
|---|---|
trie:match(input, [cb][, report_start]) | Search for patterns in input invoking cb optionally ignoring case. |
trie:search_mime(task, cb) | This is a helper mehthod to search pattern within text parts of a message in rspamd task. |
trie:search_rawmsg(task, cb[, caseless]) | This is a helper mehthod to search pattern within the whole undecoded content of rspamd task. |
trie:search_rawbody(task, cb[, caseless]) | This is a helper mehthod to search pattern within the whole undecoded content of task's body (not including headers). |
Methods
The module rspamd_trie defines the following methods.
Method trie:match(input, [cb][, report_start])
Search for patterns in input invoking cb optionally ignoring case.
Offset convention: the pattern index idx is 1-based (Lua style). Match
offsets are byte offsets and are 0-based: when report_start is set the
start is the inclusive offset of the first matched byte and the end is
the exclusive offset one past the last matched byte (so end - start is the
match length). When report_start is not set only the (exclusive) end
offset is reported, matching the historical behaviour. Start offsets are
available for every occurrence by default; pass rspamd_trie.flags.som at
creation time to request them explicitly (and to keep them even when
combined with single_match/no_start).
Parameters:
input {table or string}: one or several (ifinputis an array) strings of input textcb {function}: callback called on each pattern match in formfunction (idx, pos)whereidxis the 1-based pattern index andposis the match end offset; whenreport_startis setposis instead a table{start, end}report_start {boolean}: report both start and end offset when matching patterns
Returns:
{boolean}:trueif any pattern has been found (cbmight be called multiple times however). Ifcbis not defined then it returns a table indexed by pattern number, each entry being a list of every occurrence (either the end offset, or{start, end}whenreport_startis set)
Back to module description.
Method trie:search_mime(task, cb)
This is a helper mehthod to search pattern within text parts of a message in rspamd task
Parameters:
task {task}: objectcb {function}: callback called on each pattern match @see trie:matchcaseless {boolean}: iftruethen match ignores symbols case (ASCII only)
Returns:
{boolean}:trueif any pattern has been found (cbmight be called multiple times however)
Back to module description.
Method trie:search_rawmsg(task, cb[, caseless])
This is a helper mehthod to search pattern within the whole undecoded content of rspamd task
Parameters:
task {task}: objectcb {function}: callback called on each pattern match @see trie:matchcaseless {boolean}: iftruethen match ignores symbols case (ASCII only)
Returns:
{boolean}:trueif any pattern has been found (cbmight be called multiple times however)
Back to module description.
Method trie:search_rawbody(task, cb[, caseless])
This is a helper mehthod to search pattern within the whole undecoded content of task's body (not including headers)
Parameters:
task {task}: objectcb {function}: callback called on each pattern match @see trie:matchcaseless {boolean}: iftruethen match ignores symbols case (ASCII only)
Returns:
{boolean}:trueif any pattern has been found (cbmight be called multiple times however)
Back to module description.
Back to top.