Module rspamd_url
Module rspamd_url
This module provides routines to handle URL's and extract URL's from the text.
Objects of this class are returned, for example, by task:get_urls()
or task:get_emails()
.
You can also create rspamd_url
from any text.
Example:
local url = require "rspamd_url"
local mpool = require "rspamd_mempool"
url.init("/usr/share/rspamd/effective_tld_names.dat")
local pool = mpool.create()
local res = url.create(pool, 'Look at: http://user@test.example.com/test?query")
local t = res:to_table()
-- Content of t:
-- url = ['http://test.example.com/test?query']
-- host = ['test.example.com']
-- user = ['user']
-- path = ['test']
-- tld = ['example.com']
pool:destroy() -- res is destroyed here, so you should not use it afterwards
local mistake = res:to_table() -- INVALID! as pool is destroyed
Brief content:
Functions:
Function | Description |
---|---|
url.create([mempool,] str, [{flags_table}]) | No description |
url.init(tld_file) | Initialize url library if not initialized yet by Rspamd. |
Methods:
Method | Description |
---|---|
url:get_length() | Get length of the url. |
url:get_host() | Get domain part of the url. |
url:get_port() | Get port of the url. |
url:get_user() | Get user part of the url (e.g. |
url:get_path() | Get path of the url. |
url:get_query() | Get query of the url. |
url:get_fragment() | Get fragment of the url. |
url:get_text() | Get full content of the url. |
url:tostring() | Get full content of the url or user@domain in case of email. |
url:to_http() | Get URL suitable for HTTP request (e.g. |
url:get_raw() | Get full content of the url as it was parsed (e.g. |
url:is_phished() | Check whether URL is treated as phished. |
url:is_redirected() | Check whether URL was redirected. |
url:is_obscured() | Check whether URL is treated as obscured or obfuscated (e.g. |
url:is_html_displayed() | Check whether URL is just displayed in HTML (e.g. |
url:is_subject() | Check whether URL is found in subject. |
url:get_phished() | Get another URL that pretends to be this URL (e.g. |
url:set_redirected(url, pool) | Set url as redirected to another url. |
url:get_tld() | Get effective second level domain part (eSLD) of the url host. |
url:get_protocol() | Get protocol name. |
url:get_count() | Return number of occurrences for this particular URL. |
url:get_visible() | Get visible part of the url with html tags stripped. |
url:to_table() | Return url as a table with the following fields. |
url:get_flags() | Return flags for a specified URL as map 'flag'->true for all flags set,. |
Functions
The module rspamd_url
defines the following functions.
Function url.create([mempool,] str, [{flags_table}])
Parameters:
memory {rspamd_mempool}
: pool for URL, e.g.task:get_mempool()
text {string}
: that contains URL (can also contain other stuff)
Returns:
{url}
: new url object that exists as long as the corresponding mempool exists
Back to module description.
Function url.init(tld_file)
Initialize url library if not initialized yet by Rspamd
Parameters:
tld_file {string}
: path to effective_tld_names.dat file (public suffix list)
Returns:
- nothing
Back to module description.
Methods
The module rspamd_url
defines the following methods.
Method url:get_length()
Get length of the url
Parameters:
No parameters
Returns:
{number}
: length of url in bytes
Back to module description.
Method url:get_host()
Get domain part of the url
Parameters:
No parameters
Returns:
{string}
: domain part of URL
Back to module description.
Method url:get_port()
Get port of the url
Parameters:
No parameters
Returns:
{number}
: url port
Back to module description.
Method url:get_user()
Get user part of the url (e.g. username in email)
Parameters:
No parameters
Returns:
{string}
: user part of URL
Back to module description.
Method url:get_path()
Get path of the url
Parameters:
No parameters
Returns:
{string}
: path part of URL
Back to module description.
Method url:get_query()
Get query of the url
Parameters:
No parameters
Returns:
{string}
: query part of URL
Back to module description.
Method url:get_fragment()
Get fragment of the url
Parameters:
No parameters
Returns:
{string}
: fragment part of URL
Back to module description.
Method url:get_text()
Get full content of the url
Parameters:
No parameters
Returns:
{string}
: url string
Back to module description.
Method url:tostring()
Get full content of the url or user@domain in case of email
Parameters:
No parameters
Returns:
{string}
: url as a string
Back to module description.
Method url:to_http()
Get URL suitable for HTTP request (e.g. by trimming fragment and user parts)
Parameters:
No parameters
Returns:
{string}
: url as a string
Back to module description.
Method url:get_raw()
Get full content of the url as it was parsed (e.g. with urldecode)
Parameters:
No parameters
Returns:
{string}
: url string
Back to module description.
Method url:is_phished()
Check whether URL is treated as phished
Parameters:
No parameters
Returns:
{boolean}
:true
if URL is phished
Back to module description.
Method url:is_redirected()
Check whether URL was redirected
Parameters:
No parameters
Returns:
{boolean}
:true
if URL is redirected
Back to module description.
Method url:is_obscured()
Check whether URL is treated as obscured or obfuscated (e.g. numbers in IP address or other hacks)
Parameters:
No parameters
Returns:
{boolean}
:true
if URL is obscured
Back to module description.
Method url:is_html_displayed()
Check whether URL is just displayed in HTML (e.g. NOT a real href)
Parameters:
No parameters
Returns:
{boolean}
:true
if URL is displayed only
Back to module description.
Method url:is_subject()
Check whether URL is found in subject
Parameters:
No parameters
Returns:
{boolean}
:true
if URL is found in subject
Back to module description.
Method url:get_phished()
Get another URL that pretends to be this URL (e.g. used in phishing)
Parameters:
No parameters
Returns:
{url}
: phished URL
Back to module description.
Method url:set_redirected(url, pool)
Set url as redirected to another url
Parameters:
url {string|url}
: new url that is redirecting an old onepool {pool}
: memory pool to allocate memory if needed
Returns:
{url}
: parsed redirected url (if needed)
Back to module description.
Method url:get_tld()
Get effective second level domain part (eSLD) of the url host
Parameters:
No parameters
Returns:
{string}
: effective second level domain part (eSLD) of the url host
Back to module description.
Method url:get_protocol()
Get protocol name
Parameters:
No parameters
Returns:
{string}
: protocol as a string
Back to module description.
Method url:get_count()
Return number of occurrences for this particular URL
Parameters:
No parameters
Returns:
{number}
: number of occurrences
Back to module description.
Method url:get_visible()
Get visible part of the url with html tags stripped
Parameters:
No parameters
Returns:
{string}
: url string
Back to module description.
Method url:to_table()
Return url as a table with the following fields:
url
: full contenthost
: hostname partuser
: user partpath
: path parttld
: top level domainprotocol
: url protocol
Parameters:
No parameters
Returns:
{table}
: URL as a table
Back to module description.
Method url:get_flags()
Return flags for a specified URL as map 'flag'->true for all flags set, possible flags are:
phished
: URL is likely phishednumeric
: URL is numeric (e.g. IP address)obscured
: URL was obscuredredirected
: URL comes from redirectorhtml_displayed
: URL is used just for displaying purposestext
: URL comes from the textsubject
: URL comes from the subjecthost_encoded
: URL host part is encodedschema_encoded
: URL schema part is encodedquery_encoded
: URL query part is encodedmissing_slashes
: URL has some slashes missingidn
: URL has international charactershas_port
: URL has porthas_user
: URL has user partschemaless
: URL has no schemaunnormalised
: URL has some unicode unnormalitieszw_spaces
: URL has some zero width spacesurl_displayed
: URL has some other url-like string in visible partimage
: URL is from src attribute of img HTML tag
Parameters:
No parameters
Returns:
{table}
: URL flags
Back to module description.
Back to top.