mindmeld.text_preparation.normalizers module¶
This module contains Normalizers.
-
class
mindmeld.text_preparation.normalizers.ASCIIFold[source]¶ Bases:
mindmeld.text_preparation.normalizers.NormalizerAn ASCII Folding Normalizer.
-
fold_char_to_ascii(char)[source]¶ Return the ASCII character corresponding to the folding token.
Parameters: char -- ASCII folding token Returns: a ASCII character Return type: char
-
-
class
mindmeld.text_preparation.normalizers.Lowercase[source]¶ Bases:
mindmeld.text_preparation.normalizers.NormalizerLowercase Normalizer Class.
-
class
mindmeld.text_preparation.normalizers.NFC[source]¶ Bases:
mindmeld.text_preparation.normalizers.NormalizerUnicode NFC Normalizer Class. (Canonical Decomposition, followed by Canonical Composition)
For more details: https://unicode.org/reports/tr15/#Norm_Forms
-
class
mindmeld.text_preparation.normalizers.NFD[source]¶ Bases:
mindmeld.text_preparation.normalizers.NormalizerUnicode NFD Normalizer Class. (Canonical Decomposition)
For more details: https://unicode.org/reports/tr15/#Norm_Forms
-
class
mindmeld.text_preparation.normalizers.NFKC[source]¶ Bases:
mindmeld.text_preparation.normalizers.NormalizerUnicode NFKC Normalizer Class. (Compatibility Decomposition, followed by Canonical Composition)
For more details: https://unicode.org/reports/tr15/#Norm_Forms
-
class
mindmeld.text_preparation.normalizers.NFKD[source]¶ Bases:
mindmeld.text_preparation.normalizers.NormalizerUnicode NFKD Normalizer Class. (Compatibility Decomposition)
For more details: https://unicode.org/reports/tr15/#Norm_Forms
-
class
mindmeld.text_preparation.normalizers.NoOpNormalizer[source]¶ Bases:
mindmeld.text_preparation.normalizers.NormalizerA No-Ops Normalizer.
-
class
mindmeld.text_preparation.normalizers.Normalizer[source]¶ Bases:
abc.ABCAbstract Normalizer Base Class.
-
class
mindmeld.text_preparation.normalizers.NormalizerFactory[source]¶ Bases:
objectNormalizer Factory Class
-
class
mindmeld.text_preparation.normalizers.RegexNormalizerRule(pattern: str, replacement: str)[source]¶
-
class
mindmeld.text_preparation.normalizers.RegexNormalizerRuleFactory[source]¶ Bases:
object-
static
get_default_regex_normalizer_rule(regex_normalizer: str)[source]¶ Creates a RegexNormalizerRule object based on the given rule and the current EXCEPTION_CHARS.
Parameters: regex_normalizer (str) -- Name of the desired RegexNormalizerRule Returns: Default Regex Normalizer Rule Return type: (RegexNormalizerRule)
-
static
get_regex_normalizers(regex_norm_rules)[source]¶ A static method to get a RegexNormalizerRule from regex_norm_rules.
Parameters: regex_norm_rules (List[Dict], optional) -- Regex normalization rules represented as dictionaries. The example rule below removes any text in parentheses. {
"pattern": "(.+?)", "replacement": ""}
Returns: - List of RegexNormalizerRule ojects
- created from the regex_norm_rules_provided.
Return type: regex_normalizer_rules (List[RegexNormalizerRule])
-
EXCEPTION_CHARS= "\\@\\[\\]'"¶
-
static