mindmeld.components.nlp module

This module contains the natural language processor.

class mindmeld.components.nlp.DomainProcessor(app_path, domain, resource_loader=None, progress_bar=None)[source]

Bases: mindmeld.components.nlp.Processor

The domain processor houses the hierarchy of domain-specific natural language processing models required for understanding the user input for a particular domain.

name

str -- The name of the domain.

intent_classifier

IntentClassifier -- The intent classifier for this domain.

inspect(query, intent=None, dynamic_resource=None)[source]

Inspects the query.

Parameters:
  • query (Query) -- The query to be predicted.
  • intent (str) -- The expected intent label for this query.
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference.
Returns:

2D list that includes every feature, their value, weight and probability

Return type:

(list of lists)

process(query_text, allowed_nlp_classes=None, locale=None, language=None, time_zone=None, timestamp=None, dynamic_resource=None, verbose=False)[source]

Processes the given input text using the hierarchy of natural language processing models trained for this domain.

Parameters:
  • query_text (str, or list/tuple) -- The raw user text input, or a list of the n-best query transcripts from ASR.
  • allowed_nlp_classes (dict, optional) -- A dictionary of the intent section of the NLP hierarchy that is selected for NLP analysis. An example: { close_door: {} } where close_door is the intent. The intent belongs to the smart_home domain. If allowed_nlp_classes is None, we use the normal model predict functionality.
  • locale (str, optional) -- The locale representing the ISO 639-1 language code and ISO3166 alpha 2 country code separated by an underscore character.
  • language (str, optional) -- Language as specified using a 639-1/2 code.
  • time_zone (str, optional) -- The name of an IANA time zone, such as 'America/Los_Angeles', or 'Asia/Kolkata' See the [tz database](https://www.iana.org/time-zones) for more information.
  • timestamp (long, optional) -- A unix time stamp for the request (in seconds).
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference.
  • verbose (bool, optional) -- If True, returns class probabilities along with class prediction.
Returns:

A processed query object that contains the prediction results from applying the hierarchy of natural language processing models to the input text.

Return type:

(ProcessedQuery)

process_query(query, allowed_nlp_classes=None, dynamic_resource=None, verbose=False)[source]

Processes the given query using the full hierarchy of natural language processing models trained for this application.

Parameters:
  • query (Query, or tuple) -- The user input query, or a list of the n-best transcripts query objects.
  • allowed_nlp_classes (dict, optional) -- A dictionary of the intent section of the NLP hierarchy that is selected for NLP analysis. An example: {'close_door': {}} where close_door is the intent. The intent belongs to the smart_home domain. If allowed_nlp_classes is None, we use the normal model predict functionality.
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference.
  • verbose (bool, optional) -- If True, returns class probabilities along with class prediction.
Returns:

A processed query object that contains the prediction results from applying the full hierarchy of natural language processing models to the input query.

Return type:

(ProcessedQuery)

unload()[source]
intents

The intents supported within this domain (dict).

class mindmeld.components.nlp.EntityProcessor(app_path, domain, intent, entity_type, resource_loader=None, progress_bar=None)[source]

Bases: mindmeld.components.nlp.Processor

The entity processor houses the hierarchy of entity-specific natural language processing models required for analyzing a specific entity type in the user input.

domain

str -- The domain this entity belongs to.

intent

str -- The intent this entity belongs to.

type

str -- The type of this entity.

name

str -- The type of this entity.

role_classifier

RoleClassifier -- The role classifier for this entity type.

process_entity(query, entities, entity_index, allowed_nlp_classes, verbose=False)[source]

Processes the given entity using the hierarchy of natural language processing models trained for this entity type.

Parameters:
  • query (Query) -- The query the entity originated from.
  • entities (list) -- All entities recognized in the query.
  • entity_index (int) -- The index of the entity to process.
  • allowed_nlp_classes (dict) -- A dictionary of the NLP hierarchy that is selected for NLP analysis. An example: {'smart_home': {'close_door': {}}} where smart_home is the domain and close_door is the intent.
  • verbose (bool) -- If set to True, returns confidence scores of classes.
Returns:

Tuple containing: * ProcessedQuery: A processed query object that contains the prediction results from applying the hierarchy of natural language processing models to the input entity.
  • confidence_score: confidence scores returned by classifier.

Return type:

(tuple)

process_query(query, allowed_nlp_classes=None, dynamic_resource=None, verbose=False)[source]

Not implemented

resolve_entity(entity, aligned_entity_spans=None)[source]

Does the resolution of a single entity. If aligned_entity_spans is not None, the resolution leverages the n-best transcripts entity spans. Otherwise, it does the resolution on just the text of the entity.

Parameters:
  • entity (QueryEntity) -- The entity to process.
  • aligned_entity_spans (list[QueryEntity]) -- The list of aligned n-best entity spans to improve resolution.
Returns:

The entity populated with the resolved values.

Return type:

(Entity)

unload()[source]
ready
class mindmeld.components.nlp.IntentProcessor(app_path, domain, intent, resource_loader=None, progress_bar=None)[source]

Bases: mindmeld.components.nlp.Processor

The intent processor houses the hierarchy of intent-specific natural language processing models required for understanding the user input for a particular intent.

domain

str -- The domain this intent belongs to.

name

str -- The name of this intent.

entity_recognizer

EntityRecognizer -- The entity recognizer for this intent.

get_entity_processors(label_set=None)[source]
process(query_text, allowed_nlp_classes=None, locale=None, language=None, time_zone=None, timestamp=None, dynamic_resource=None, verbose=False)[source]

Processes the given input text using the hierarchy of natural language processing models trained for this intent.

Parameters:
  • query_text (str, list, tuple) -- The raw user text input, or a list of the n-best query transcripts from ASR.
  • locale (str, optional) -- The locale representing the ISO 639-1 language code and ISO3166 alpha 2 country code separated by an underscore character.
  • language (str, optional) -- Language as specified using a 639-1/2 code.
  • time_zone (str, optional) -- The name of an IANA time zone, such as 'America/Los_Angeles', or 'Asia/Kolkata' See the [tz database](https://www.iana.org/time-zones) for more information.
  • timestamp (long, optional) -- A unix time stamp for the request (in seconds).
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference.
  • verbose (bool, optional) -- If True, returns class as well as predict probabilities.
Returns:

A processed query object that contains the prediction results from applying the hierarchy of natural language processing models to the input text.

Return type:

(ProcessedQuery)

process_query(query, allowed_nlp_classes=None, dynamic_resource=None, max_ngram_search=3, verbose=False)[source]

Processes the given query using the hierarchy of natural language processing models trained for this intent.

Parameters:
  • query (Query, tuple) -- The user input query, or a list of the n-best transcripts query objects.
  • allowed_nlp_classes (dict, optional) -- A dictionary of the NLP hierarchy that is selected for NLP analysis. An example: {'smart_home': {'close_door': {}}} where smart_home is the domain and close_door is the intent.
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference.
  • max_ngram_search (int, optional) -- The max n-gram number to process the query for search
  • verbose (bool, optional) -- If True, returns class as well as predict probabilities.
Returns:

A processed query object that contains the prediction results from applying the hierarchy of natural language processing models to the input query.

Return type:

(ProcessedQuery)

unload()[source]
entities

The entity types associated with this intent (list).

nbest_transcripts_enabled

Whether or not to run processing on the n-best transcripts for this intent (bool).

class mindmeld.components.nlp.NaturalLanguageProcessor(app_path, resource_loader=None, config=None, progress_bar=None)[source]

Bases: mindmeld.components.nlp.Processor

The natural language processor is the MindMeld component responsible for understanding the user input using a hierarchy of natural language processing models.

domain_classifier

DomainClassifier -- The domain classifier for this application.

extract_nlp_masked_components_list(allow_nlp_components_list=None, deny_nlp_components_list=None)[source]

This function validates a user inputted list of allowed nlp components against the NLP hierarchy and construct a hierarchy dictionary as follows: {domain: {intent: {}} if the validation of list of allowed nlp components has passed.

Parameters:
  • allow_nlp_components_list (list) -- A list of allow NLP components in the format "domain.intent.entity.role".
  • deny_nlp_components_list (list) -- A list of deny NLP components in the format "domain.intent.entity.role".
Returns:

A dictionary of NLP hierarchy.

Return type:

(dict)

inspect(markup, domain=None, intent=None, dynamic_resource=None)[source]

Inspect the marked up query and print the table of features and weights.

Parameters:
  • markup (str) -- The marked up query string.
  • domain (str) -- The gold value for domain classification.
  • intent (str) -- The gold value for intent classification.
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference.
static print_inspect_stats(stats)[source]

Prints formatted output matrix

process(query_text, allowed_nlp_classes=None, allowed_intents=None, allow_nlp=None, deny_nlp=None, locale=None, language=None, time_zone=None, timestamp=None, dynamic_resource=None, verbose=False)[source]

Processes the given query using the full hierarchy of natural language processing models trained for this application.

Parameters:
  • query_text (str, tuple) -- The raw user text input, or a list of the n-best query transcripts from ASR.
  • allowed_nlp_classes (dict, optional) -- A dictionary of the NLP hierarchy that is selected for NLP analysis. An example: {'smart_home': {'close_door': {}}} where smart_home is the domain and close_door is the intent.
  • allowed_intents (list, optional) -- A list of allowed intents to use for the NLP processing.
  • allow_nlp (list, optional) -- A list of allow NLP components to use for the NLP processing.
  • deny_nlp (list, optional) -- A list of denied NLP components to use for the NLP processing.
  • locale (str, optional) -- The locale representing the ISO 639-1 language code and ISO3166 alpha 2 country code separated by an underscore character.
  • language (str, optional) -- Language as specified using a 639-1/2 code. This parameter is ignored deprecated this is an application level parameter.
  • time_zone (str, optional) -- The name of an IANA time zone, such as 'America/Los_Angeles', or 'Asia/Kolkata' See the [tz database](https://www.iana.org/time-zones) for more information.
  • timestamp (long, optional) -- A unix time stamp for the request (in seconds).
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference.
  • verbose (bool, optional) -- If True, returns class probabilities along with class prediction.
Returns:

A processed query object that contains the prediction results from applying the full hierarchy of natural language processing models to the input query.

Return type:

(ProcessedQuery)

process_query(query, allowed_nlp_classes=None, dynamic_resource=None, verbose=False)[source]

Processes the given query using the full hierarchy of natural language processing models trained for this application.

Parameters:
  • query (Query, tuple) -- The user input query, or a list of the n-best transcripts query objects.
  • allowed_nlp_classes (dict, optional) -- A dictionary of the NLP hierarchy that is selected for NLP analysis. An example: {'smart_home': {'close_door': {}}} where smart_home is the domain and close_door is the intent. If allowed_nlp_classes is None, we just use the normal model predict functionality.
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference.
  • verbose (bool, optional) -- If True, returns class probabilities along with class prediction.
Returns:

A processed query object that contains the prediction results from applying the full hierarchy of natural language processing models to the input query.

Return type:

(ProcessedQuery)

unload()[source]
domains

The domains supported by this application.

class mindmeld.components.nlp.Processor(app_path, resource_loader=None, config=None)[source]

Bases: abc.ABC

A generic base class for processing queries through the MindMeld NLP components.

resource_loader

ResourceLoader -- An object which can load resources for the processor.

dirty

bool -- Indicates whether the processor has unsaved changes to its models.

ready

bool -- Indicates whether the processor is ready to process messages.

build(incremental=False, label_set=None)[source]

Builds all the natural language processing models for this processor and its children.

Parameters:
  • incremental (bool, optional) -- When True, only build models whose training data or configuration has changed since the last build. Defaults to False.
  • label_set (string, optional) -- The label set from which to train all classifiers.
create_query(query_text, locale=None, language=None, time_zone=None, timestamp=None)[source]

Creates a query with the given text.

Parameters:
  • query_text (str, list[str]) -- Text or list of texts to create a query object for.
  • locale (str, optional) -- The locale representing the ISO 639-1 language code and ISO3166 alpha 2 country code separated by an underscore character.
  • language (str, optional) -- Language as specified using a 639-1/2 code.
  • time_zone (str, optional) -- The name of an IANA time zone, such as 'America/Los_Angeles', or 'Asia/Kolkata' See the [tz database](https://www.iana.org/time-zones) for more information.
  • timestamp (long, optional) -- A unix time stamp for the request (in seconds).
Returns:

A newly constructed query or tuple of queries.

Return type:

(Query)

dump()[source]

Saves all the natural language processing models for this processor and its children to disk.

evaluate(print_stats=False, label_set=None)[source]

Evaluates all the natural language processing models for this processor and its children.

Parameters:
  • print_stats (bool) -- If true, prints the full stats table. Otherwise prints just the accuracy
  • label_set (str, optional) -- The label set from which to evaluate all classifiers.
load(incremental_timestamp=None)[source]

Loads all the natural language processing models for this processor and its children from disk.

Parameters:incremental_timestamp (str, optional) -- The incremental timestamp value.
process(query_text, allowed_nlp_classes=None, locale=None, language=None, time_zone=None, timestamp=None, dynamic_resource=None, verbose=False)[source]

Processes the given query using the full hierarchy of natural language processing models trained for this application.

Parameters:
  • query_text (str, tuple) -- The raw user text input, or a list of the n-best query transcripts from ASR.
  • allowed_nlp_classes (dict, optional) -- A dictionary of the NLP hierarchy that is selected for NLP analysis. An example: {'smart_home': {'close_door': {}}} where smart_home is the domain and close_door is the intent.
  • locale (str, optional) -- The locale representing the ISO 639-1 language code and ISO3166 alpha 2 country code separated by an underscore character.
  • language (str, optional) -- Language as specified using a 639-1/2 code. This parameter is deprecated deprecated this is an application level parameter.
  • time_zone (str, optional) -- The name of an IANA time zone, such as 'America/Los_Angeles', or 'Asia/Kolkata' See the [tz database](https://www.iana.org/time-zones) for more information.
  • timestamp (long, optional) -- A unix time stamp for the request (in seconds).
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference.
  • verbose (bool, optional) -- If True, returns class probabilities along with class prediction.
Returns:

A processed query object that contains the prediction results from applying the full hierarchy of natural language processing models to the input query.

Return type:

(ProcessedQuery)

process_query(query, allowed_nlp_classes=None, dynamic_resource=None, verbose=False)[source]

Processes the given query using the full hierarchy of natural language processing models trained for this application.

Parameters:
  • query (Query, tuple) -- The user input query, or a list of the n-best transcripts query objects.
  • allowed_nlp_classes (dict, optional) -- A dictionary of the NLP hierarchy that is selected for NLP analysis. An example: {'smart_home': {'close_door': {}}} where smart_home is the domain and close_door is the intent.
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference verbose (bool, optional): If True, returns class probabilities along with class prediction.
Returns:

A processed query object that contains the prediction results from applying the full hierarchy of natural language processing models to the input query.

Return type:

(ProcessedQuery)

unload()[source]
incremental_timestamp

The incremental timestamp of this processor (str).

instance_map = <WeakValueDictionary>

The map of identity to instance.

mindmeld.components.nlp.restart_subprocesses()[source]

Restarts the process pool executor

mindmeld.components.nlp.subproc_call_instance_function(instance_id, func_name, *args, **kwargs)[source]

A module function used as a trampoline to call an instance function from within a long running child process.

Parameters:instance_id (number) -- id(inst) of the Processor instance that needs called
Returns:The result of the called function