mindmeld.core module¶
This module contains a collection of the core data structures used in MindMeld.
-
class
mindmeld.core.Bunch(**kwargs)[source]¶ Bases:
dictDictionary-like object that exposes its keys as attributes.
Inspired by scikit learn's Bunches
>>> b = Bunch(a=1, b=2) >>> b['b'] 2 >>> b.b 2 >>> b.a = 3 >>> b['a'] 3 >>> b.c = 6 >>> b['c'] 6
-
class
mindmeld.core.CallableRegistry[source]¶ Bases:
objectA registration class to map callable object names to corresponding objects.
-
functions_registry¶ Getter for functions registry
-
-
class
mindmeld.core.Entity(text, entity_type, role=None, value=None, display_text=None, confidence=None)[source]¶ Bases:
objectAn Entity is any important piece of text that provides more information about the user intent.
-
text¶ str -- The text contents that span the entity
-
type¶ str -- The type of the entity
-
role¶ str -- The role of the entity
-
value¶ dict -- The resolved value of the entity
-
display_text¶ str -- A human readable text representation of the entity for use in natural language responses.
-
confidence¶ float -- A confidence value from 0 to 1 about how confident the entity recognizer was for the given class label.
-
static
from_cache_typed(obj)[source]¶ Function to instantiate a cached Entity by the class type which was serialized when it's to_cache() function was called.
-
static
is_system_entity(entity_type)[source] Checks whether the provided entity type is a MindMeld-recognized system entity.
Parameters: entity_type (str) -- An entity type Returns: True if the entity is a system entity type, else False Return type: bool
-
entity_class_map= {'Entity': <class 'mindmeld.core.Entity'>, 'NestedEntity': <class 'mindmeld.core.NestedEntity'>, 'QueryEntity': <class 'mindmeld.core.QueryEntity'>}¶
-
-
class
mindmeld.core.FormEntity(entity: str, role: Optional[str] = None, responses: Optional[List[str]] = None, retry_response: Optional[List[str]] = None, value: Optional[Dict] = None, default_eval: Optional[bool] = True, hints: Optional[List[str]] = None, custom_eval: Optional[str] = None)[source]¶ Bases:
objectA form entity is used for defining custom objects for the entity form used in AutoEntityFilling (slot-filling).
-
entity¶ str -- Entity name
-
role¶ str, optional -- The role of the entity
-
responses¶ list/str, optional -- Message(s) for prompting the user for missing entities
-
retry_response¶ list/str, optional -- Message(s) for re-prompting users. If not provided,
-
defaults to responses
-
value¶ str, optional -- The resolved value of the entity
-
default_eval¶ bool, optional -- Use system validation (default: True)
-
hints¶ list, optional -- Developer defined list of keywords to verify the
-
user input against
-
custom_eval¶ str, optional -- custom validation function name (should return either bool:
-
validated or not) or a custom resolved value for the entity. If custom resolved value
-
is returned, the slot response is considered to be valid.
-
-
class
mindmeld.core.NestedEntity(texts, spans, token_spans, entity, children=None)[source]¶ Bases:
objectAn entity with the context of the query it came from, along with information like the entity's parent and children.
-
texts¶ tuple -- Tuple containing the three forms of text: raw text, processed text, and normalized text
-
spans¶ tuple -- Tuple containing the character index spans of the text for this entity for each text form
-
token_spans¶ tuple -- Tuple containing the token index spans of the text for this entity for each text form
-
entity¶ Entity -- The entity object
-
parent¶ NestedEntity -- The parent of the nested entity
-
children¶ tuple of NestedEntity -- A tuple of children nested entities
-
classmethod
from_query(query, span=None, normalized_span=None, entity_type=None, role=None, entity=None, parent_offset=None, children=None)[source]¶ Creates an entity node using a parent entity node
Parameters: - query (Query) -- Description
- span (Span) -- The span of the entity in the query's raw text
- normalized_span (None, optional) -- The span of the entity in the query's normalized text
- entity_type (str, optional) -- The entity type. One of this and entity must be provided
- role (str, optional) -- The entity role. Ignored if entity is provided.
- entity (Entity, optional) -- The entity. One of this and entity must be provided
- parent_offset (int) -- The offset of the parent in the query
- children (None, optional) -- Description
Returns: the created entity
-
static
get_largest_non_overlapping_entities(candidates, get_span_func)[source]¶ This function filters out overlapping entity spans
Parameters: - candidates (iterable) -- A iterable of candidates to filter based on span
- get_span_func (function) -- A function that accesses the span from each candidate
Returns: A list of non-overlapping candidates
Return type:
-
normalized_span¶ The span of the normalized text span
-
normalized_text¶ The normalized input text
-
normalized_token_span¶ The token_span of the normalized text span
-
processed_span¶ The span of the preprocessed text span
-
processed_text¶ The input text after it has been preprocessed
-
processed_token_span¶ The token_span of the preprocessed text span
-
span¶ The span of original input text span
-
text¶ The original input text span
-
token_span¶ The token_span of original input text span
-
-
class
mindmeld.core.ProcessedQuery(query, domain=None, intent=None, entities=None, is_gold=False, nbest_transcripts_queries=None, nbest_transcripts_entities=None, nbest_aligned_entities=None, confidence=None)[source]¶ Bases:
objectA processed query contains a query and the additional metadata that has been labeled or predicted.
-
query¶ Query -- The underlying query object.
-
domain¶ str -- The domain of the query
-
entities¶ list -- A list of entities present in this query
-
intent¶ str -- The intent of the query
-
is_gold¶ bool -- Indicates whether the details in this query were predicted or human labeled
-
nbest_transcripts_queries¶ list -- A list of n best transcript queries
-
nbest_transcripts_entities¶ list -- A list of lists of entities for each query
-
nbest_aligned_entities¶ list -- A list of lists of aligned entities
-
confidence¶ dict -- A dictionary of the class probas for the domain and intent classifier
-
-
class
mindmeld.core.Query(raw_text, processed_text, normalized_tokens, char_maps, locale=None, language=None, time_zone=None, timestamp=None, stemmed_tokens=None)[source]¶ Bases:
objectThe query object is responsible for processing and normalizing raw user text input so that classifiers can deal with it. A query stores three forms of text: raw text, processed text, and normalized text. The query object is also responsible for translating text ranges across these forms.
-
raw_text¶ str -- the original input text
-
processed_text¶ str -- the text after it has been preprocessed. The pre-processing happens at the application level and is generally used for special characters
-
normalized_tokens¶ tuple of str -- a list of normalized tokens
-
system_entity_candidates¶ tuple -- A list of system entities extracted from the text
-
locale¶ str, optional -- The locale representing the ISO 639-1 language code and ISO3166 alpha 2 country code separated by an underscore character.
-
language¶ str, optional -- The language code representing ISO 639-1 language codes.
-
time_zone¶ str -- The IANA id for the time zone in which the query originated such as 'America/Los_Angeles'
-
timestamp¶ long, optional -- A unix timestamp used as the reference time If not specified, the current system time is used. If time_zone is not also specified, this parameter is ignored
-
stemmed_tokens¶ list -- A sequence of stemmed tokens for the query text
-
get_system_entity_candidates(sys_types)[source]¶ Parameters: sys_types (set of str) -- A set of entity types to select Returns: Returns candidate system entities of the types specified Return type: list
-
get_text_form(form)[source]¶ Programmatically retrieves text by form
Parameters: form (int) -- A valid text form (TEXT_FORM_RAW, TEXT_FORM_PROCESSED, or TEXT_FORM_NORMALIZED) Returns: The requested text Return type: str
-
get_verbose_normalized_tokens()[source]¶ This function returns a list of dictionaries containing details of each normalized token
-
transform_index(index, form_in, form_out)[source]¶ Transforms a text index from one form to another.
Parameters: Returns: the equivalent index of text in the output form
Return type:
-
transform_span(text_span, form_in, form_out)[source]¶ Transforms a text range from one form to another.
Parameters: Returns: the equivalent range of text in the output form
Return type:
-
language Language of the query specified using a 639-2 code.
-
locale The locale representing the ISO 639-1/2 language code and ISO3166 alpha 2 country code separated by an underscore character.
-
normalized_text¶ The normalized input text
-
normalized_tokens The tokens of the normalized input text
-
processed_text The input text after it has been preprocessed
-
stemmed_text¶ The stemmed input text
-
text¶ The original input text
-
time_zone The IANA id for the time zone in which the query originated such as 'America/Los_Angeles'.
-
timestamp A unix timestamp for when the time query was created. If time_zone is None, this parameter is ignored.
-
-
class
mindmeld.core.QueryEntity(texts, spans, token_spans, entity, children=None)[source]¶ Bases:
mindmeld.core.NestedEntityAn entity with the context of the query it came from.
-
text¶ str -- The raw text that was processed into this entity
-
processed_text¶ str -- The processed text that was processed into this entity
-
normalized_text¶ str -- The normalized text that was processed into this entity
-
span¶ Span -- The character index span of the raw text that was processed into this entity
-
processed_span¶ Span -- The character index span of the raw text that was processed into this entity
-
span Span -- The character index span of the raw text that was processed into this entity
-
start¶ int -- The character index start of the text range that was processed into this entity. This index is based on the normalized text of the query passed in.
-
end¶ int -- The character index end of the text range that was processed into this entity. This index is based on the normalized text of the query passed in.
-
-
class
mindmeld.core.Span(start, end)[source]¶ Bases:
objectObject representing a text span with start and end indices
-
start¶ int -- The index from the original text that represents the start of the span
-
end¶ int -- The index from the original text that represents the end of the span
-
static
get_largest_non_overlapping_candidates(spans)[source]¶ Finds the set of the largest non-overlapping candidates.
Parameters: spans (list) -- List of tuples representing candidate spans (start_index, end_index + 1). Returns: List of the largest non-overlapping spans. Return type: selected_spans (list)
-
shift(offset)[source]¶ Shifts a span by offset
Parameters: offset (int) -- The number to change start and end by
-
slice(obj)[source]¶ Returns the slice of the object for this span
Parameters: obj -- The object to slice Returns: The slice of the passed in object for this span
-
end
-
start
-
-
mindmeld.core.resolve_entity_conflicts(query_entities)[source]¶ This method takes a list containing query entities for a query, and resolves any entity conflicts. The resolved list is returned.
- If two entities in a query conflict with each other, use the following logic:
- If the target entity is a subset of another entity, then delete the target entity.
- If the target entity shares the identical span as another entity, then keep the one with the highest confidence.
- If the target entity overlaps with another entity, then keep the one with the highest confidence.
Parameters: entities (list of QueryEntity) -- A list of query entities to resolve Returns: A filtered list of query entities Return type: list of QueryEntity