mindmeld.models.tagger_models module

This module contains the Memm entity recognizer.

class mindmeld.models.tagger_models.PytorchTaggerModel(config)[source]

Bases: mindmeld.models.model.PytorchModel

evaluate(examples, labels)[source]

Evaluates a model against the given examples and labels

Parameters:
  • examples -- A list of examples to predict
  • labels -- A list of expected labels
Returns:

an object containing information about the evaluation

Return type:

ModelEvaluation

fit(examples, labels, params=None)[source]
classmethod load(path)[source]
predict(examples, dynamic_resource=None)[source]
predict_proba(examples, dynamic_resource=None)[source]
ALLOWED_CLASSIFIER_TYPES = ['embedder', 'lstm-pytorch', 'cnn-lstm', 'lstm-lstm']
class mindmeld.models.tagger_models.TaggerModel(config)[source]

Bases: mindmeld.models.model.Model

A machine learning classifier for tags.

This class manages feature extraction, training, cross-validation, and prediction. The design goal is that after providing initial settings like hyperparameters, grid-searchable hyperparameters, feature extractors, and cross-validation settings, TaggerModel manages all of the details involved in training and prediction such that the input to training or prediction is Query objects, and the output is class names, and no data manipulation is needed from the client.

classifier_type

str -- The name of the classifier type. Currently recognized values are "memm","crf", and "lstm"

hyperparams

dict -- A kwargs dict of parameters that will be used to initialize the classifier object.

grid_search_hyperparams

dict -- Like 'hyperparams', but the values are lists of parameters. The training process will grid search over the Cartesian product of these parameter lists and select the best via cross-validation.

feat_specs

dict -- A mapping from feature extractor names, as given in FEATURE_NAME_MAP, to a kwargs dict, which will be passed into the associated feature extractor function.

cross_validation_settings

dict -- A dict that contains "type", which specifies the name of the cross-validation strategy, such as "k-folds" or "shuffle". The remaining keys are parameters specific to the cross-validation type, such as "k" when the type is "k-folds".

evaluate(examples, labels, fetch_distribution=False)[source]

Evaluates a model against the given examples and labels

Parameters:
  • examples -- A list of examples to predict
  • labels -- A list of expected labels
Returns:

an object containing information about the evaluation

Return type:

ModelEvaluation

fit(examples, labels, params=None)[source]

Trains the model.

Parameters:
  • examples (ProcessedQueryList.QueryIterator) -- A list of queries to train on.
  • labels (ProcessedQueryList.EntitiesIterator) -- A list of expected labels.
  • params (dict) -- Parameters of the classifier.
get_feature_matrix(examples, y=None, fit=False)[source]
classmethod load(path)[source]

Load the model state to memory.

Parameters:path (str) -- The path to dump the model to
predict(examples, dynamic_resource=None)[source]
Parameters:
  • examples (list of mindmeld.core.Query) -- a list of queries to train on
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference
Returns:

a list of predicted labels

Return type:

(list of tuples of mindmeld.core.QueryEntity)

predict_proba(examples, dynamic_resource=None, fetch_distribution=False)[source]
Parameters:
  • examples (list of mindmeld.core.Query) -- a list of queries to train on
  • dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference
Returns:

a list of predicted labels with confidence scores

Return type:

list of tuples of (mindmeld.core.QueryEntity)

select_params(examples, labels, selection_settings=None)[source]
Selects the best set of hyper-parameters for a given set of examples and true labels
through cross-validation
Parameters:
  • examples -- A list of example queries
  • labels -- A list of labels associated with the queries
  • selection_settings -- A dictionary of parameter lists to select from
Returns:

A dictionary of optimized parameters to use

Return type:

dict

unload()[source]
view_extracted_features(query, dynamic_resource=None)[source]

Returns a dictionary of extracted features and their weights for a given query

Parameters:
  • query (mindmeld.core.Query) -- The query to extract features from
  • dynamic_resource (dict) -- The dynamic resource used along with the query
Returns:

A list of dictionaries of extracted features and their weights

Return type:

list

ACCURACY_SCORING = 'accuracy'
ALLOWED_CLASSIFIER_TYPES = ['crf', 'memm', 'lstm']
CRF_TYPE = 'crf'
DEFAULT_FEATURES = {'bag-of-words-seq': {'ngram_lengths_to_start_positions': {1: [-2, -1, 0, 1, 2], 2: [-2, -1, 0, 1]}}, 'in-gaz-span-seq': {}, 'sys-candidates-seq': {'start_positions': [-1, 0, 1]}}
LSTM_TYPE = 'lstm'
MEMM_TYPE = 'memm'
SEQUENCE_MODELS = ['crf']
SEQ_ACCURACY_SCORING = 'seq_accuracy'
class mindmeld.models.tagger_models.TaggerModelFactory[source]

Bases: mindmeld.models.model.AbstractModelFactory

static get_model_cls(config: mindmeld.models.model.ModelConfig)[source]