mindmeld.models.model module¶

This module contains base classes for models defined in the models subpackage.

class mindmeld.models.model.AbstractModel(config: mindmeld.models.model.ModelConfig)[source]¶

Bases: abc.ABC

A minimalistic abstract class upon which all models are based.

In order to maintain backwards compatability, the skeleton of this class is designed based on all the access points of Classifier class and its sub-classes. In addition, it also introduces the decoupled way of dumping/loading across different model types (meaning not all models are dumped/loaded the same way). Furthermore, methods for validation are also introduced so as to cater to the model specific config validations. Lastly, this skeleton also includes some common properties that could be used across all model types.

dump(path: str)[source]¶

Dumps the model's configs and calls the child model's dump method

Parameters:	path (str) -- The path to dump the model to

evaluate(examples: Union[mindmeld.resource_loader.ProcessedQueryList.QueryIterator, mindmeld.resource_loader.ProcessedQueryList.ListIterator], labels: Union[mindmeld.resource_loader.ProcessedQueryList.DomainIterator, mindmeld.resource_loader.ProcessedQueryList.IntentIterator, mindmeld.resource_loader.ProcessedQueryList.EntitiesIterator]) → Union[mindmeld.models.evaluation.StandardModelEvaluation, mindmeld.models.evaluation.EntityModelEvaluation][source]¶

fit(examples: Union[mindmeld.resource_loader.ProcessedQueryList.QueryIterator, mindmeld.resource_loader.ProcessedQueryList.ListIterator], labels: Union[mindmeld.resource_loader.ProcessedQueryList.DomainIterator, mindmeld.resource_loader.ProcessedQueryList.IntentIterator, mindmeld.resource_loader.ProcessedQueryList.EntitiesIterator], params: Dict = None)[source]¶

get_resource(name) → Any[source]¶

initialize_resources(resource_loader: mindmeld.resource_loader.ResourceLoader, examples: Union[mindmeld.resource_loader.ProcessedQueryList.QueryIterator, mindmeld.resource_loader.ProcessedQueryList.ListIterator] = None, labels: Union[mindmeld.resource_loader.ProcessedQueryList.DomainIterator, mindmeld.resource_loader.ProcessedQueryList.IntentIterator, mindmeld.resource_loader.ProcessedQueryList.EntitiesIterator] = None)[source]¶

classmethod load(path: str) → Type[mindmeld.models.model.AbstractModel][source]¶

classmethod load_model_config(path: str) → mindmeld.models.model.ModelConfig[source]¶

Dumps the model's configs. Raises a FileNotFoundError if no configs file is found. For backwards compatability wherein TextModel was serialized and dumped, the textModel file is loaded using joblib and then the config is obtained from its public variables.

Parameters:	path (str) -- The path where the model is dumped

predict(examples: Union[mindmeld.resource_loader.ProcessedQueryList.QueryIterator, mindmeld.resource_loader.ProcessedQueryList.ListIterator], dynamic_resource: Dict = None) → Union[List[Any], List[List[Any]]][source]¶

predict_proba(examples: Union[mindmeld.resource_loader.ProcessedQueryList.QueryIterator, mindmeld.resource_loader.ProcessedQueryList.ListIterator], dynamic_resource: Dict = None) → Union[List[Tuple[str, Dict[str, float]]], Tuple[Tuple[mindmeld.core.QueryEntity, float]]][source]¶

register_resources(**kwargs)[source]¶

view_extracted_features(example: mindmeld.core.ProcessedQuery, dynamic_resource: Dict = None) → List[Dict][source]¶

text_preparation_pipeline¶

class mindmeld.models.model.AbstractModelFactory[source]¶

Bases: abc.ABC

Abstract class for individual model factories like TextModelFactory and TaggerModelFactory

get_model_cls(config: mindmeld.models.model.ModelConfig) → Type[mindmeld.models.model.AbstractModel][source]¶

class mindmeld.models.model.Model(config)[source]¶

Bases: mindmeld.models.model.AbstractModel

An abstract class upon which all models are based.

config¶: ModelConfig -- The configuration for the model

get_feature_matrix(examples, y=None, fit=False)[source]¶

initialize_resources(resource_loader, examples=None, labels=None)[source]¶

Load the required resources for feature extractors. Each feature extractor uses @requires decorator to declare required resources. Based on feature list in model config a list of required resources are compiled, and the passed in resource loader is then used to load the resources accordingly.

Parameters:	resource_loader (ResourceLoader) -- application resource loader object examples (list) -- Optional. A list of examples. labels (list) -- Optional. A parallel list to examples. The gold labels for each example.

register_resources(**kwargs)[source]¶

Registers resources which are accessible to feature extractors

Parameters:	**kwargs -- dictionary of resources to register

requires_resource(resource)[source]¶

select_params(examples, labels, selection_settings=None)[source]¶

Selects the best set of hyper-parameters for a given set of examples and true labels: through cross-validation

Parameters:	examples -- A list of example queries labels -- A list of labels associated with the queries selection_settings -- A dictionary of parameter lists to select from
Returns:	A dictionary of optimized parameters to use
Return type:	dict

ALLOWED_CLASSIFIER_TYPES = NotImplemented¶

LIKELIHOOD_SCORING = 'log_loss'¶

class mindmeld.models.model.ModelConfig(model_type: str = None, example_type: str = None, label_type: str = None, features: Dict = None, model_settings: Dict = None, params: Dict = None, param_selection: Dict = None, train_label_set: Pattern[str] = None, test_label_set: Pattern[str] = None)[source]¶

Bases: object

A value object representing a model configuration.

model_type¶: str -- The name of the model type. Will be used to find the model class to instantiate

example_type¶: str -- The type of the examples which will be passed into fit() and predict(). Used to select feature extractors

label_type¶: str -- The type of the labels which will be passed into fit() and returned by predict(). Used to select the label encoder

model_settings¶: dict -- Settings specific to the model type specified

params¶: dict -- Params to pass to the underlying classifier

param_selection¶: dict -- Configuration for param selection (using cross validation) {'type': 'shuffle', 'n': 3, 'k': 10, 'n_jobs': 2, 'scoring': '', 'grid': {} }

features¶: dict -- The keys are the names of feature extractors and the values are either a kwargs dict which will be passed into the feature extractor function, or a callable which will be used as to extract features

train_label_set¶: regex pattern -- The regex pattern for finding training file names.

test_label_set¶: regex pattern -- The regex pattern for finding testing file names.

get_ngram_lengths_and_thresholds(rname: str) → Tuple[source]¶

Returns the n-gram lengths and thresholds to extract to optimize resource collection

Parameters:	rname (string) -- Name of the resource
Returns:	tuple containing: lengths (list of int): list of n-gram lengths to be extracted thresholds (list of int): thresholds to be applied to corresponding n-gram lengths
Return type:	(tuple)

required_resources() → Set[source]¶

Returns the resources this model requires

Returns:	set of required resources for this model
Return type:	set

resolve_config(new_config: mindmeld.models.model.ModelConfig)[source]¶

This method resolves any config incompatibility issues by loading the latest settings from the app config to the current config

Parameters:	new_config (ModelConfig) -- The ModelConfig representing the app's latest config

to_dict() → Dict[source]¶

Converts the model config object into a dict

Returns:	A dict version of the config
Return type:	dict

to_json() → str[source]¶

Converts the model config object to JSON

Returns:	JSON representation of the classifier
Return type:	str

example_type

features

label_type

model_settings

model_type

param_selection

params

test_label_set

train_label_set

class mindmeld.models.model.PytorchModel(config)[source]¶

Bases: mindmeld.models.model.AbstractModel

initialize_resources(resource_loader, examples=None, labels=None)[source]¶

ALLOWED_CLASSIFIER_TYPES = NotImplemented¶