mindmeld.auto_annotator module

class mindmeld.auto_annotator.Annotator(app_path, annotation_rules=None, language='en', locale='en_US', overwrite=False, unannotate_supported_entities_only=True, unannotation_rules=None, **kwargs)[source]

Bases: abc.ABC

Abstract Annotator class that can be used to build a custom Annotation class.

annotate()[source]

Annotate data.

parse(sentence, **kwargs)[source]

Extract entities from a sentence. Detected entities should be represented as dictionaries with the following keys: "body", "start" (start index), "end" (end index), "value", "dim" (entity type).

Parameters:sentence (str) -- Sentence to detect entities.
Returns:List of QueryEntity objects.
Return type:query_entities (list)
unannotate()[source]

Unannotate data.

valid_entity_check(entity)[source]

Determine if an entity type is valid.

Parameters:entity (str) -- Name of entity to annotate.
Returns:Whether entity is valid.
Return type:bool
supported_entity_types

**Returns* -- supported_entity_types (list)* -- List of supported entity types.

class mindmeld.auto_annotator.AnnotatorAction[source]

Bases: enum.Enum

An enumeration.

ANNOTATE = 'annotate'
UNANNOTATE = 'unannotate'
class mindmeld.auto_annotator.BootstrapAnnotator(*args, **kwargs)[source]

Bases: mindmeld.auto_annotator.Annotator

Bootstrap Annotator class used to generate annotations based on existing annotations.

parse(sentence, entity_types, domain: str, intent: str, **kwargs)[source]
Parameters:
  • sentence (str) -- Sentence to detect entities.
  • entity_types (list) -- List of entity types to parse. If None, all possible entity types will be parsed.
  • domain (str) -- Allowed domain.
  • intent (str) -- Allowed intent.
Returns:

List of QueryEntity objects.

Return type:

query_entities (list)

text_queries_to_processed_queries(text_queries: List[str])[source]

Converts text queries into processed queries.

Parameters:text_queries (List[str]) -- List of raw text queries.
Returns:List of processed queries.
Return type:processed_queries (List[ProcessedQuery])
valid_entity_check(entity)[source]

Determine if an entity type is valid.

Parameters:entity (str) -- Name of entity to annotate.
Returns:Whether entity is valid.
Return type:bool
supported_entity_types

**Returns* -- supported_entity_types (list)* -- List of supported entity types.

class mindmeld.auto_annotator.MultiLingualAnnotator(*args, **kwargs)[source]

Bases: mindmeld.auto_annotator.Annotator

The MultiLingualAnnotator detects entities in English and non-English sentences.

  1. If the 'language' is English, this annotator solely uses the Spacy's English NER model to
    detect entities.
  2. If the 'language' is not English, this annotator will detect entities using both Spacy
    non-English NER models and a Duckling-based Annotator. A. The TranslationDucklingAnnotator will be used if a 'translator' service is available (E.g. "GoogleTranslator"). Non-English duckling candidates are matched to English entities detected by Spacy's English NER model. B. The NoTranslationDucklingAnnotator will be used if a 'translator' service is not available. The set of Non-English duckling candidates with the largest non-overlapping spans is selected.
parse(sentence, entity_types=None, **kwargs)[source]
Parameters:
  • sentence (str) -- Sentence to detect entities.
  • entity_types (list) -- List of entity types to parse. If None, all possible entity types will be parsed.
Returns:

List of QueryEntity objects.

Return type:

query_entities (list)

supported_entity_types

**Returns* -- supported_entity_types (list)* -- List of supported entity types.

class mindmeld.auto_annotator.NoTranslationDucklingAnnotator(*args, **kwargs)[source]

Bases: mindmeld.auto_annotator.Annotator

The NoTranslationDucklingAnnotator detects entities by filtering non-English candidates from Duckling to a set containing the largest non-overlapping spans.

Unlike the TranslationDucklingAnnotator, this annotator does not use a translation service. Unlike the MultiLingualAnnotator, this annotator does not use non-English Spacy NER models.

parse(sentence, entity_types=None, **kwargs)[source]
Parameters:
  • sentence (str) -- Sentence to detect entities.
  • entity_types (list) -- List of entity types to parse. If None, all possible entity types will be parsed.
Returns:

List of QueryEntity objects.

Return type:

query_entities (list)

supported_entity_types

**Returns* -- supported_entity_types (list)* -- List of supported entity types.

class mindmeld.auto_annotator.SpacyAnnotator(*args, **kwargs)[source]

Bases: mindmeld.auto_annotator.Annotator

Annotator class that uses spacy to generate annotations. Depending on the language, supported entities can include: "sys_time", "sys_interval", "sys_duration", "sys_number", "sys_amount-of-money", "sys_distance", "sys_weight", "sys_ordinal", "sys_quantity", "sys_percent", "sys_org", "sys_loc", "sys_person", "sys_gpe", "sys_norp", "sys_fac", "sys_product", "sys_event", "sys_law", "sys_langauge", "sys_work-of-art", "sys_other-quantity". For more information on the supported entities for the Spacy Annotator check the MindMeld docs.

parse(sentence, entity_types=None, **kwargs)[source]

Extracts entities from a sentence. Detected entities should are represented as dictionaries with the following keys: "body", "start" (start index), "end" (end index), "value", "dim" (entity type).

Parameters:
  • sentence (str) -- Sentence to detect entities.
  • entity_types (list) -- List of entity types to annotate. If None, all possible entity types will be annotated.
Returns:

List of QueryEntity objects.

Return type:

query_entities (list)

supported_entity_types

This function generates a list of supported entities for the given language. These entities labels are mapped to MindMeld sys_entities. The "misc" spacy entity is skipped since the category too broad to be helpful in an application.

Returns:List of supported entity types.
Return type:supported_entity_types (list)
class mindmeld.auto_annotator.TranslationDucklingAnnotator(*args, **kwargs)[source]

Bases: mindmeld.auto_annotator.Annotator

The TranslationDucklingAnnotator detects entities in non-English sentences using a translation service and Duckling by following these steps:

  1. The non-English sentence is translated to English.
  2. Spacy detects entities in the translated English sentence.
  3. Duckling detects non-English entities in the non-English sentence.

4. A heuristic in parse() is used to match and filer the non-English entities against the English entities. 5. The final set of filtered non-English entities are returned.

Unlike the NoTranslationDucklingAnnotator, this annotator uses a translation service. Unlike the MultiLingualAnnotator, this annotator does not use non-English Spacy NER models.

parse(sentence, entity_types=None, **kwargs)[source]

Implements a heuristic to match English entities detected by Spacy on the translated non-English sentence against the non-English entities detected by Duckling on the non-English sentence.

Parameters:
  • sentence (str) -- Sentence to detect entities.
  • entity_types (list) -- List of entity types to parse. If None, all possible entity types will be parsed.
Returns:

List of QueryEntity objects.

Return type:

query_entities (list)

supported_entity_types

**Returns* -- supported_entity_types (list)* -- List of supported entity types.

mindmeld.auto_annotator.register_all_annotators()[source]