mindmeld.active_learning.classifiers module

This module contains classifiers for the Active Learning Pipeline.

class mindmeld.active_learning.classifiers.ALClassifier(app_path: str, tuning_level: list)[source]

Bases: abc.ABC

Abstract class for Active Learning Classifiers.

train()[source]
class mindmeld.active_learning.classifiers.MindMeldALClassifier(app_path: str, tuning_level: list, n_classifiers: int, aggregate_statistic: str = None, class_level_statistic: str = None)[source]

Bases: mindmeld.active_learning.classifiers.ALClassifier

Active Learning classifier that uses MindMeld classifiers internally. Handles the training of MindMeld components (Domain or Intent classifiers) and collecting performance statistics (eval_stats).

domain_classifier_fit_eval(sampled_queries: mindmeld.resource_loader.ProcessedQueryList, unsampled_queries: mindmeld.resource_loader.ProcessedQueryList, test_queries: mindmeld.resource_loader.ProcessedQueryList, domain2id: Dict)[source]

Fit and evaluate the domain classifier. :param sampled_queries: List of Sampled Queries :type sampled_queries: ProcessedQueryList :param unsampled_queries: List of Unsampled Queries :type unsampled_queries: ProcessedQueryList :param test_queries: List of Test Queries :type test_queries: ProcessedQueryList :param domain2id: Dictionary mapping domains to IDs :type domain2id: Dict

Returns:
List of probability distributions
for unsampled queries.
dc_eval_test (mindmeld.models.model.StandardModelEvaluation): Mindmeld evaluation
object for the domain classifier.
Return type:dc_queries_prob_vectors (List[List])
entity_recognizers_fit_eval(sampled_queries: mindmeld.resource_loader.ProcessedQueryList, unsampled_queries: mindmeld.resource_loader.ProcessedQueryList, test_queries: mindmeld.resource_loader.ProcessedQueryList, domain_to_intents: Dict, entity2id: Dict)[source]

Fit and evaluate the entity recognizer. :param sampled_queries: List of Sampled Queries. :type sampled_queries: ProcessedQueryList :param unsampled_queries: List of Unsampled Queries. :type unsampled_queries: ProcessedQueryList :param test_queries: List of Test Queries. :type test_queries: ProcessedQueryList :param domain_to_intents: Dictionary mapping domain to list of intents. :type domain_to_intents: Dict :param entity2id: Dictionary mapping entities to IDs. :type entity2id: Dict

Returns:
List of probability distributions
for unsampled queries.
ic_eval_test_dict (Dict): Dictionary mapping a domain (str) to the
associated ic_eval_test object.
Return type:ic_queries_prob_vectors (List[List])
intent_classifiers_fit_eval(sampled_queries: mindmeld.resource_loader.ProcessedQueryList, unsampled_queries: mindmeld.resource_loader.ProcessedQueryList, test_queries: mindmeld.resource_loader.ProcessedQueryList, domain_list: Dict, domain_to_intent2id: Dict)[source]

Fit and evaluate the intent classifier. :param sampled_queries: List of Sampled Queries. :type sampled_queries: ProcessedQueryList :param unsampled_queries: List of Unsampled Queries. :type unsampled_queries: ProcessedQueryList :param test_queries: List of Test Queries. :type test_queries: ProcessedQueryList :param domain_list: List of domains used by the application. :type domain_list: List[str] :param domain_to_intent2id: Dictionary mapping intents to IDs. :type domain_to_intent2id: Dict

Returns:
List of probability distributions
for unsampled queries.
ic_eval_test_dict (Dict): Dictionary mapping a domain (str) to the
associated ic_eval_test object.
Return type:ic_queries_prob_vectors (List[List])
train(data_bucket: mindmeld.active_learning.data_loading.DataBucket, heuristic: mindmeld.active_learning.heuristics.Heuristic, tuning_type: mindmeld.constants.TuningType = <TuningType.CLASSIFIER: 'classifier'>)[source]

Main training function.

Parameters:
  • data_bucket (DataBucket) -- DataBucket for current iteration
  • heuristic (Heuristic) -- Current Heuristic.
  • tuning_type (TuningType) -- Component to be tuned ("classifier" or "tagger")
Returns:

Evaluation metrics to be included in accuracies.json confidences_2d (List[List]): 2D array with probability vectors for unsampled queries

(returns a 3d output for tagger tuning).

confidences_3d (List[List[List]]]): 3D array with probability vectors for unsampled

queries from multiple classifiers

domain_indices (Dict): Maps domains to a tuple containing the start and

ending indexes of intents with the given domain.

Return type:

eval_stats (defaultdict)

train_multi(data_bucket: mindmeld.active_learning.data_loading.DataBucket)[source]

Trains multiple models to get a 3D probability array for multi-model selection strategies. :param data_bucket: Databucket for current iteration :type data_bucket: DataBucket

Returns:
3D array with probability vectors for unsampled
queries from multiple classifiers
Return type:confidences_3d (List[List[List]]])
train_single(data_bucket: mindmeld.active_learning.data_loading.DataBucket, eval_stats: collections.defaultdict = None)[source]

Trains a single model to get a 2D probability array for single-model selection strategies. :param data_bucket: Databucket for current iteration :type data_bucket: DataBucket :param eval_stats: Evaluation metrics to be included in accuracies.json :type eval_stats: defaultdict

Returns:
2D array with probability vectors for unsampled queries
(returns a 3d output for tagger tuning).
Return type:confidences_2d (List)