mindmeld.models.taggers.memm module

This module contains the Memm entity recognizer.

class mindmeld.models.taggers.memm.MemmModel(**parameters)[source]

Bases: mindmeld.models.taggers.taggers.Tagger

A maximum-entropy Markov model.

extract_and_predict(examples, config, resources)[source]

Does both feature extraction and prediction. Often necessary for sequence models when the prediction of the previous example is used as a feature for the next example. If this is not the case, extract is simply called before predict here. Note that the MindMeld config and resources are passed in each time to make the underlying model implementation stateless.

Parameters:
  • examples (list of mindmeld.core.Query) -- A list of queries to extract features for and predict
  • config (ModelConfig) -- The ModelConfig which may contain information used for feature extraction
  • resources (dict) -- Resources which may be used for this model's feature extraction
Returns:

A list of predicted labels (in encoded format)

Return type:

(list of classification labels)

static extract_example_features(example, config, resources)[source]

Extracts feature dicts for each token in an example.

Parameters:
  • example (mindmeld.core.Query) -- A query.
  • config (ModelConfig) -- The ModelConfig which may contain information used for feature extraction.
  • resources (dict) -- Resources which may be used for this model's feature extraction.
Returns:

Features.

Return type:

(list[dict])

extract_features(examples, config, resources, y=None, fit=True)[source]

Transforms a list of examples into a feature matrix. Use extract_and_predict if you are extracting features for an example at test time, since the previous tag prediction is needed as a feature of the next tag.

Parameters:
  • examples (list of core.Query) -- The examples.
  • config (ModelConfig) -- The ModelConfig which may contain information used for feature extraction
  • resources (dict) -- Resources which may be used for this model's feature extraction
Returns:

tuple containing:

  • (numpy.matrix): The feature matrix.
  • (numpy.array): The group labels for examples.

Return type:

(tuple)

fit(X, y)[source]

Trains the model. X and y are the format of what is returned by extract_features. There is no restriction on their type or content. X should be the fully processed data with extracted features that are ready to be used to train the model. y should be a list of classes as encoded by the label_encoder

Parameters:
  • X (list) -- Generally a list of feature vectors, one for each training example
  • y (list) -- A list of classification labels (encoded by the label_encoder, NOT MindMeld entity objects)
Returns:

self

get_params(deep=True)[source]

Gets a dictionary of all of the current model parameters and their values

Parameters:deep (bool) -- Not used, needed for sklearn compatibility
Returns:A dictionary of the model parameter names as keys and their set values
Return type:(dict)
static load(model_path)[source]

Load the model state to memory. This is a no-op since we do not have to do anything special to load default serializable models for SKLearn.

Parameters:model_path (str) -- The path to dump the model to
predict(X, dynamic_resource=None)[source]

Predicts the labels from a feature matrix X. Again X is the format of what is returned by extract_features.

Parameters:X (list) -- A list of feature vectors, one for each example
Returns:a list of predicted labels (in an encoded format)
Return type:(list of classification labels)
predict_proba(examples, config, resources)[source]
Parameters:
  • examples (list of mindmeld.core.Query) -- A list of queries to extract features for and predict
  • config (ModelConfig) -- The ModelConfig which may contain information used for feature extraction
  • resources (dict) -- Resources which may be used for this model's feature extraction
Returns:

A list of predicted labels (in encoded format) and confidence scores

Return type:

(list of lists)

predict_proba_distribution(examples, config, resources)[source]
set_params(**parameters)[source]

Sets the model parameters. Defaults should be set for all parameters such that a model is initialized with reasonable default parameters if none are explicitly passed in.

Parameters:**parameters -- Arbitrary keyword arguments. The keys are model parameter names and the values are what they should be set to
Returns:self
setup_model(config)[source]

"Not implemented.