mindmeld.text_preparation.normalizers module¶

This module contains Normalizers.

class mindmeld.text_preparation.normalizers.ASCIIFold[source]¶

Bases: mindmeld.text_preparation.normalizers.Normalizer

An ASCII Folding Normalizer.

fold_char_to_ascii(char)[source]¶

Return the ASCII character corresponding to the folding token.

Parameters:	char -- ASCII folding token
Returns:	a ASCII character
Return type:	char

fold_str_to_ascii(text)[source]¶

Return the ASCII character corresponding to the folding token string.

Parameters:	str -- ASCII folding token string
Returns:	a ASCII character
Return type:	char

static load_ascii_folding_table()[source]¶: Load mapping of ascii code points to ascii characters.

normalize(text)[source]¶

Parameters:	text (str) -- Input text.
Returns:	Normalized Text.
Return type:	normalized_text (str)

class mindmeld.text_preparation.normalizers.Lowercase[source]¶

Bases: mindmeld.text_preparation.normalizers.Normalizer

Lowercase Normalizer Class.

normalize(text)[source]¶

Parameters:	text (str) -- Input text.
Returns:	Normalized Text.
Return type:	normalized_text (str)

class mindmeld.text_preparation.normalizers.NFC[source]¶

Bases: mindmeld.text_preparation.normalizers.Normalizer

Unicode NFC Normalizer Class. (Canonical Decomposition, followed by Canonical Composition)

For more details: https://unicode.org/reports/tr15/#Norm_Forms

normalize(text)[source]¶

Parameters:	text (str) -- Input text.
Returns:	Normalized Text.
Return type:	normalized_text (str)

class mindmeld.text_preparation.normalizers.NFD[source]¶

Bases: mindmeld.text_preparation.normalizers.Normalizer

Unicode NFD Normalizer Class. (Canonical Decomposition)

For more details: https://unicode.org/reports/tr15/#Norm_Forms

normalize(text)[source]¶

Parameters:	text (str) -- Input text.
Returns:	Normalized Text.
Return type:	normalized_text (str)

class mindmeld.text_preparation.normalizers.NFKC[source]¶

Bases: mindmeld.text_preparation.normalizers.Normalizer

Unicode NFKC Normalizer Class. (Compatibility Decomposition, followed by Canonical Composition)

For more details: https://unicode.org/reports/tr15/#Norm_Forms

normalize(text)[source]¶

Parameters:	text (str) -- Input text.
Returns:	Normalized Text.
Return type:	normalized_text (str)

class mindmeld.text_preparation.normalizers.NFKD[source]¶

Bases: mindmeld.text_preparation.normalizers.Normalizer

Unicode NFKD Normalizer Class. (Compatibility Decomposition)

For more details: https://unicode.org/reports/tr15/#Norm_Forms

normalize(text)[source]¶

Parameters:	text (str) -- Input text.
Returns:	Normalized Text.
Return type:	normalized_text (str)

class mindmeld.text_preparation.normalizers.NoOpNormalizer[source]¶

Bases: mindmeld.text_preparation.normalizers.Normalizer

A No-Ops Normalizer.

normalize(text)[source]¶

Parameters:	text (str) -- Input text.
Returns:	Returns the original text.
Return type:	normalized_text (str)

class mindmeld.text_preparation.normalizers.Normalizer[source]¶

Bases: abc.ABC

Abstract Normalizer Base Class.

normalize(text)[source]¶

Parameters:	text (str) -- Input text.
Returns:	Normalized Text.
Return type:	normalized_text (str)

tojson()[source]¶

Method defined to obtain recursive JSON representation of a TextPreparationPipeline.

Parameters:	None. --
Returns:	JSON representation of Preprocessor (dict) .

class mindmeld.text_preparation.normalizers.NormalizerFactory[source]¶

Bases: object

Normalizer Factory Class

static get_normalizer(normalizer: str)[source]¶

A static method to get a Normalizer

Parameters:	normalizer (str) -- Name of the desired Normalizer class
Returns:	Normalizer Class
Return type:	(Normalizer)

class mindmeld.text_preparation.normalizers.RegexNormalizerRule(pattern: str, replacement: str)[source]¶

Bases: mindmeld.text_preparation.normalizers.Normalizer

normalize(s)[source]¶

Parameters:	text (str) -- Input text.
Returns:	Normalized Text.
Return type:	normalized_text (str)

tojson()[source]¶

Method defined to obtain recursive JSON representation of a TextPreparationPipeline.

Parameters:	None. --
Returns:	JSON representation of Preprocessor (dict) .

class mindmeld.text_preparation.normalizers.RegexNormalizerRuleFactory[source]¶

Bases: object

static get_default_regex_normalizer_rule(regex_normalizer: str)[source]¶

Creates a RegexNormalizerRule object based on the given rule and the current EXCEPTION_CHARS.

Parameters:	regex_normalizer (str) -- Name of the desired RegexNormalizerRule
Returns:	Default Regex Normalizer Rule
Return type:	(RegexNormalizerRule)

static get_regex_normalizers(regex_norm_rules)[source]¶

A static method to get a RegexNormalizerRule from regex_norm_rules.

Parameters:

regex_norm_rules (List[Dict], optional) --

Regex normalization rules represented as dictionaries. The example rule below removes any text in parentheses. {

"pattern": "(.+?)", "replacement": ""

}

Returns:

List of RegexNormalizerRule ojects: created from the regex_norm_rules_provided.

Return type: regex_normalizer_rules (List[RegexNormalizerRule])

EXCEPTION_CHARS = "\\@\\[\\]'"¶