Module rspamd_textpart

This module provides different methods to manipulate text parts data. Text parts could be obtained from the rspamd_task by using of method task:get_text_parts()

Example:

rspamd_config.R_EMPTY_IMAGE = function (task)
	parts = task:get_text_parts()
	if parts then
		for _,part in ipairs(parts) do
			if part:is_empty() then
				texts = task:get_texts()
				if texts then
					return true
				end
				return false
			end
		end
	end
	return false
end

Brief content:

Methods:

Method Description
text_part:is_utf() Return TRUE if part is a valid utf text.
text_part:has_8bit_raw() Return TRUE if a part has raw 8bit characters.
text_part:has_8bit() Return TRUE if a part has raw 8bit characters.
text_part:get_content([type]) Get the text of the part (html tags stripped).
text_part:get_raw_content() Get the original text of the part.
text_part:get_content_oneline() Get the text of the part (html tags and newlines stripped).
text_part:get_length() Get length of the text of the part.
mime_part:get_raw_length() Get length of the raw content of the part (e.g.
mime_part:get_urls_length() Get length of the urls within the part.
mime_part:get_lines_count() Get lines number in the part.
mime_part:get_stats() Returns a table with the following data.
mime_part:get_words_count() Get words number in the part.
mime_part:get_words([how]) Get words in the part.
mime_part:filter_words(regexp, [how][, max]]) Filter words using some regexp.
text_part:is_empty() Returns true if the specified part is empty.
text_part:is_html() Returns true if the specified part has HTML content.
text_part:get_html() Returns html content of the specified part.
text_part:get_language() Returns the code of the most used unicode script in the text part.
text_part:get_charset() Returns part real charset.
text_part:get_languages() Returns array of tables of all languages detected for a part.
text_part:get_fuzzy_hashes(mempool) Returns direct hash of textpart as a string and array [1..32] of shingles each represented as a following table.
text_part:get_mimepart() Returns the mime part object corresponding to this text part.

Methods

The module rspamd_textpart defines the following methods.

Method text_part:is_utf()

Return TRUE if part is a valid utf text

Parameters:

No parameters

Returns:

  • {boolean}: true if part is valid UTF8 part

Back to module description.

Method text_part:has_8bit_raw()

Return TRUE if a part has raw 8bit characters

Parameters:

No parameters

Returns:

  • {boolean}: true if a part has raw 8bit characters

Back to module description.

Method text_part:has_8bit()

Return TRUE if a part has raw 8bit characters

Parameters:

No parameters

Returns:

  • {boolean}: true if a part has encoded 8bit characters

Back to module description.

Method text_part:get_content([type])

Get the text of the part (html tags stripped). Optional type defines type of content to get:

  • content (default): utf8 content with HTML tags stripped and newlines preserved
  • content_oneline: utf8 content with HTML tags and newlines stripped
  • raw: raw content, not mime decoded nor utf8 converted
  • raw_parsed: raw content, mime decoded, not utf8 converted
  • raw_utf: raw content, mime decoded, utf8 converted (but with HTML tags and newlines)

Parameters:

No parameters

Returns:

  • {text}: UTF8 encoded content of the part (zero-copy if not converted to a lua string)

Back to module description.

Method text_part:get_raw_content()

Get the original text of the part

Parameters:

No parameters

Returns:

  • {text}: UTF8 encoded content of the part (zero-copy if not converted to a lua string)

Back to module description.

Method text_part:get_content_oneline()

Get the text of the part (html tags and newlines stripped)

Parameters:

No parameters

Returns:

  • {text}: UTF8 encoded content of the part (zero-copy if not converted to a lua string)

Back to module description.

Method text_part:get_length()

Get length of the text of the part

Parameters:

No parameters

Returns:

  • {integer}: length of part in bytes

Back to module description.

Method mime_part:get_raw_length()

Get length of the raw content of the part (e.g. HTML with tags unstripped)

Parameters:

No parameters

Returns:

  • {integer}: length of part in bytes

Back to module description.

Method mime_part:get_urls_length()

Get length of the urls within the part

Parameters:

No parameters

Returns:

  • {integer}: length of urls in bytes

Back to module description.

Method mime_part:get_lines_count()

Get lines number in the part

Parameters:

No parameters

Returns:

  • {integer}: number of lines in the part

Back to module description.

Method mime_part:get_stats()

Returns a table with the following data:

  • lines: number of lines
  • spaces: number of spaces
  • double_spaces: double spaces
  • empty_lines: number of empty lines
  • non_ascii_characters: number of non ascii characters
  • ascii_characters: number of ascii characters

Parameters:

No parameters

Returns:

  • {table}: table of stats

Back to module description.

Method mime_part:get_words_count()

Get words number in the part

Parameters:

No parameters

Returns:

  • {integer}: number of words in the part

Back to module description.

Method mime_part:get_words([how])

Get words in the part. Optional how argument defines type of words returned:

  • stem: stemmed words (default)
  • norm: normalised words (utf normalised + lowercased)
  • raw: raw words in utf (if possible)
  • full: list of tables, each table has the following fields:
    • [1] - stemmed word
    • [2] - normalised word
    • [3] - raw word
    • [4] - flags (table of strings)

Parameters:

No parameters

Returns:

  • {table/strings}: words in the part

Back to module description.

Method mime_part:filter_words(regexp, [how][, max]])

Filter words using some regexp:

  • stem: stemmed words (default)
  • norm: normalised words (utf normalised + lowercased)
  • raw: raw words in utf (if possible)
  • full: list of tables, each table has the following fields:
    • [1] - stemmed word
    • [2] - normalised word
    • [3] - raw word
    • [4] - flags (table of strings)

Parameters:

  • regexp {rspamd_regexp}: regexp to match
  • how {string}: what words to extract
  • max {number}: maximum number of hits returned (all hits if <= 0 or nil)

Returns:

  • {table/strings}: words matching regexp

Back to module description.

Method text_part:is_empty()

Returns true if the specified part is empty

Parameters:

No parameters

Returns:

  • {bool}: whether a part is empty

Back to module description.

Method text_part:is_html()

Returns true if the specified part has HTML content

Parameters:

No parameters

Returns:

  • {bool}: whether a part is HTML part

Back to module description.

Method text_part:get_html()

Returns html content of the specified part

Parameters:

No parameters

Returns:

  • {html}: html content

Back to module description.

Method text_part:get_language()

Returns the code of the most used unicode script in the text part. Does not work with raw parts

Parameters:

No parameters

Returns:

  • {string}: short abbreviation (such as ru) for the script’s language

Back to module description.

Method text_part:get_charset()

Returns part real charset

Parameters:

No parameters

Returns:

  • {string}: charset of the part

Back to module description.

Method text_part:get_languages()

Returns array of tables of all languages detected for a part:

  • ‘code’: language code (short string)
  • ‘prob’: logarithm of probability

Parameters:

No parameters

Returns:

  • {array|tables}: all languages detected for the part

Back to module description.

Method text_part:get_fuzzy_hashes(mempool)

Returns direct hash of textpart as a string and array [1..32] of shingles each represented as a following table:

  • [1] - 64 bit fuzzy hash represented as a string
  • [2..4] - strings used to generate this hash

Parameters:

  • mempool {rspamd_mempool}: - memory pool (usually task pool)

Returns:

  • {string,array|tables}: fuzzy hashes calculated

Back to module description.

Method text_part:get_mimepart()

Returns the mime part object corresponding to this text part

Parameters:

No parameters

Returns:

  • {mimepart}: mimepart object

Back to module description.

Back to top.

Module rspamd_mimepart

This module provides access to mime parts found in a message

Example:

rspamd_config.MISSING_CONTENT_TYPE = function(task)
	local parts = task:get_parts()
	if parts and #parts > 1 then
		-- We have more than one part
		for _,p in ipairs(parts) do
			local ct = p:get_header('Content-Type')
			-- And some parts have no Content-Type header
			if not ct then
				return true
			end
		end
	end
	return false
end

Brief content:

Methods:

Method Description
mime_part:get_header(name[, case_sensitive]) Get decoded value of a header specified with optional case_sensitive flag.
mime_part:get_header_raw(name[, case_sensitive]) Get raw value of a header specified with optional case_sensitive flag.
mime_part:get_header_full(name[, case_sensitive]) Get raw value of a header specified with optional case_sensitive flag.
mimepart:get_header_count(name[, case_sensitive]) Lightweight version if you need just a header’s count.
mimepart:get_raw_headers() Get all undecoded headers of a mime part as a string.
mimepart:get_headers() Get all undecoded headers of a mime part as a string.
mime_part:get_content() Get the parsed content of part.
mime_part:get_raw_content() Get the raw content of part.
mime_part:get_length() Get length of the content of the part.
mime_part:get_type() Extract content-type string of the mime part.
mime_part:get_type_full() Extract content-type string of the mime part with all attributes.
mime_part:get_detected_type() Extract content-type string of the mime part.
mime_part:get_detected_type_full() Extract content-type string of the mime part with all attributes.
mime_part:get_detected_ext() Returns a msdos extension name according to lua_magic detection.
mime_part:get_cte() Extract content-transfer-encoding for a part.
mime_part:get_filename() Extract filename associated with mime part if it is an attachment.
mime_part:is_image() Returns true if mime part is an image.
mime_part:get_image() Returns rspamd_image structure associated with this part.
mime_part:is_archive() Returns true if mime part is an archive.
mime_part:is_attachment() Returns true if mime part looks like an attachment.
mime_part:get_archive() Returns rspamd_archive structure associated with this part.
mime_part:is_multipart() Returns true if mime part is a multipart part.
mime_part:is_message() Returns true if mime part is a message part (message/rfc822).
mime_part:get_boundary() Returns boundary for a part (extracted from parent multipart for normal parts and.
mime_part:get_enclosing_boundary() Returns an enclosing boundary for a part even for multiparts.
mime_part:get_children() Returns rspamd_mimepart table of part’s childer.
mime_part:is_text() Returns true if mime part is a text part.
mime_part:get_text() Returns rspamd_textpart structure associated with this part.
mime_part:get_digest() Returns the unique digest for this mime part.
mime_part:get_id() Returns the order of the part in parts list.
mime_part:is_broken() Returns true if mime part has incorrectly specified content type.
mime_part:headers_foreach(callback, [params]) This method calls callback for each header that satisfies some condition.
mime_part:get_parent() Returns parent part for this part.
mime_part:get_specific() Returns specific lua content for this part.
mime_part:set_specific(<any>) Sets a specific content for this part.
mime_part:is_specific(<any>) Returns true if part has specific lua content.
mime_part:get_urls([need_emails|list_protos][, need_images]) Get all URLs found in a mime part.
mime_part:get_stats() Returns a table with the following data.

Methods

The module rspamd_mimepart defines the following methods.

Method mime_part:get_header(name[, case_sensitive])

Get decoded value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter.

Parameters:

  • name {string}: name of header to get
  • case_sensitive {boolean}: case sensitiveness flag to search for a header

Returns:

  • {string}: decoded value of a header

Back to module description.

Method mime_part:get_header_raw(name[, case_sensitive])

Get raw value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter.

Parameters:

  • name {string}: name of header to get
  • case_sensitive {boolean}: case sensitiveness flag to search for a header

Returns:

  • {string}: raw value of a header

Back to module description.

Method mime_part:get_header_full(name[, case_sensitive])

Get raw value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter. This method returns more information about the header as a list of tables with the following structure:

  • name - name of a header
  • value - raw value of a header
  • decoded - decoded value of a header
  • tab_separated - true if a header and a value are separated by tab character
  • empty_separator - true if there are no separator between a header and a value

Parameters:

  • name {string}: name of header to get
  • case_sensitive {boolean}: case sensitiveness flag to search for a header

Returns:

  • {list of tables}: all values of a header as specified above

Example:

function check_header_delimiter_tab(task, header_name)
	for _,rh in ipairs(task:get_header_full(header_name)) do
		if rh['tab_separated'] then return true end
	end
	return false
end

Back to module description.

Method mimepart:get_header_count(name[, case_sensitive])

Lightweight version if you need just a header’s count

  • By default headers are searched in caseless matter.

Parameters:

  • name {string}: name of header to get
  • case_sensitive {boolean}: case sensitiveness flag to search for a header

Returns:

  • {number}: number of header’s occurrences or 0 if not found

Back to module description.

Method mimepart:get_raw_headers()

Get all undecoded headers of a mime part as a string

Parameters:

No parameters

Returns:

  • {rspamd_text}: all raw headers for a message as opaque text

Back to module description.

Method mimepart:get_headers()

Get all undecoded headers of a mime part as a string

Parameters:

No parameters

Returns:

  • {rspamd_text}: all raw headers for a message as opaque text

Back to module description.

Method mime_part:get_content()

Get the parsed content of part

Parameters:

No parameters

Returns:

  • {text}: opaque text object (zero-copy if not casted to lua string)

Back to module description.

Method mime_part:get_raw_content()

Get the raw content of part

Parameters:

No parameters

Returns:

  • {text}: opaque text object (zero-copy if not casted to lua string)

Back to module description.

Method mime_part:get_length()

Get length of the content of the part

Parameters:

No parameters

Returns:

  • {integer}: length of part in bytes

Back to module description.

Method mime_part:get_type()

Extract content-type string of the mime part

Parameters:

No parameters

Returns:

  • {string,string}: content type in form ‘type’,’subtype’

Back to module description.

Method mime_part:get_type_full()

Extract content-type string of the mime part with all attributes

Parameters:

No parameters

Returns:

  • {string,string,table}: content type in form ‘type’,’subtype’, {attrs}

Back to module description.

Method mime_part:get_detected_type()

Extract content-type string of the mime part. Use lua_magic detection

Parameters:

No parameters

Returns:

  • {string,string}: content type in form ‘type’,’subtype’

Back to module description.

Method mime_part:get_detected_type_full()

Extract content-type string of the mime part with all attributes. Use lua_magic detection

Parameters:

No parameters

Returns:

  • {string,string,table}: content type in form ‘type’,’subtype’, {attrs}

Back to module description.

Method mime_part:get_detected_ext()

Returns a msdos extension name according to lua_magic detection

Parameters:

No parameters

Returns:

  • {string}: detected extension (see lua_magic.types)

Back to module description.

Method mime_part:get_cte()

Extract content-transfer-encoding for a part

Parameters:

No parameters

Returns:

  • {string}: content transfer encoding (e.g. base64 or 7bit)

Back to module description.

Method mime_part:get_filename()

Extract filename associated with mime part if it is an attachment

Parameters:

No parameters

Returns:

  • {string}: filename or nil if no file is associated with this part

Back to module description.

Method mime_part:is_image()

Returns true if mime part is an image

Parameters:

No parameters

Returns:

  • {bool}: true if a part is an image

Back to module description.

Method mime_part:get_image()

Returns rspamd_image structure associated with this part. This structure has the following methods:

  • get_width - return width of an image in pixels
  • get_height - return height of an image in pixels
  • get_type - return string representation of image’s type (e.g. ‘jpeg’)
  • get_filename - return string with image’s file name
  • get_size - return size in bytes

Parameters:

No parameters

Returns:

  • {rspamd_image}: image structure or nil if a part is not an image

Back to module description.

Method mime_part:is_archive()

Returns true if mime part is an archive

Parameters:

No parameters

Returns:

  • {bool}: true if a part is an archive

Back to module description.

Method mime_part:is_attachment()

Returns true if mime part looks like an attachment

Parameters:

No parameters

Returns:

  • {bool}: true if a part looks like an attachment

Back to module description.

Method mime_part:get_archive()

Returns rspamd_archive structure associated with this part. This structure has the following methods:

  • get_files - return list of strings with filenames inside archive
  • get_files_full - return list of tables with all information about files
  • is_encrypted - return true if an archive is encrypted
  • get_type - return string representation of image’s type (e.g. ‘zip’)
  • get_filename - return string with archive’s file name
  • get_size - return size in bytes

Parameters:

No parameters

Returns:

  • {rspamd_archive}: archive structure or nil if a part is not an archive

Back to module description.

Method mime_part:is_multipart()

Returns true if mime part is a multipart part

Parameters:

No parameters

Returns:

  • {bool}: true if a part is is a multipart part

Back to module description.

Method mime_part:is_message()

Returns true if mime part is a message part (message/rfc822)

Parameters:

No parameters

Returns:

  • {bool}: true if a part is is a message part

Back to module description.

Method mime_part:get_boundary()

Returns boundary for a part (extracted from parent multipart for normal parts and from the part itself for multipart)

Parameters:

No parameters

Returns:

  • {string}: boundary value or nil

Back to module description.

Method mime_part:get_enclosing_boundary()

Returns an enclosing boundary for a part even for multiparts. For normal parts this method is identical to get_boundary

Parameters:

No parameters

Returns:

  • {string}: boundary value or nil

Back to module description.

Method mime_part:get_children()

Returns rspamd_mimepart table of part’s childer. Returns nil if mime part is not multipart or a message part.

Parameters:

No parameters

Returns:

  • {rspamd_mimepart}: table of children

Back to module description.

Method mime_part:is_text()

Returns true if mime part is a text part

Parameters:

No parameters

Returns:

  • {bool}: true if a part is a text part

Back to module description.

Method mime_part:get_text()

Returns rspamd_textpart structure associated with this part.

Parameters:

No parameters

Returns:

  • {rspamd_textpart}: textpart structure or nil if a part is not an text

Back to module description.

Method mime_part:get_digest()

Returns the unique digest for this mime part

Parameters:

No parameters

Returns:

  • {string}: 128 characters hex string with digest of the part

Back to module description.

Method mime_part:get_id()

Returns the order of the part in parts list

Parameters:

No parameters

Returns:

  • {number}: index of the part (starting from 1 as it is Lua API)

Back to module description.

Method mime_part:is_broken()

Returns true if mime part has incorrectly specified content type

Parameters:

No parameters

Returns:

  • {bool}: true if a part has bad content type

Back to module description.

Method mime_part:headers_foreach(callback, [params])

This method calls callback for each header that satisfies some condition. By default, all headers are iterated unless callback returns true. Nil or false means continue of iterations. Params could be as following:

  • full: header value is full table of all attributes task:get_header_full for details
  • regexp: return headers that satisfies the specified regexp

Parameters:

  • callback {function}: function from header name and header value
  • params {table}: optional parameters

Returns:

No return

Back to module description.

Method mime_part:get_parent()

Returns parent part for this part

Parameters:

No parameters

Returns:

  • {rspamd_mimepart}: parent part or nil

Back to module description.

Method mime_part:get_specific()

Returns specific lua content for this part

Parameters:

No parameters

Returns:

  • {any}: specific lua content

Back to module description.

Method mime_part:set_specific(<any>)

Sets a specific content for this part

Parameters:

No parameters

Returns:

  • {any}: previous specific lua content (or nil)

Back to module description.

Method mime_part:is_specific(<any>)

Returns true if part has specific lua content

Parameters:

No parameters

Returns:

  • {boolean}: flag

Back to module description.

Method mime_part:get_urls([need_emails|list_protos][, need_images])

Get all URLs found in a mime part. Telephone urls and emails are not included unless explicitly asked in list_protos

Parameters:

  • need_emails {boolean}: if true then return also email urls, this can be a comma separated string of protocols desired or a table (e.g. mailto or telephone)
  • need_images {boolean}: return urls from images (<img src=…>) as well

Returns:

  • {table rspamd_url}: list of all urls found

Back to module description.

Method mime_part:get_stats()

Returns a table with the following data:

  • -
  • lines: number of lines
  • spaces: number of spaces
  • double_spaces: double spaces
  • empty_lines: number of empty lines
  • non_ascii_characters: number of non ascii characters
  • ascii_characters: number of ascii characters

Parameters:

No parameters

Returns:

  • {table}: table of stats

Back to module description.

Back to top.