mindmeld.models.helpers module

This module contains some helper functions for the models package

class mindmeld.models.helpers.FileBackedList[source]

Bases: object

FileBackedList implements an interface for simple list use cases that is backed by a temporary file on disk. This is useful for simple list processing in a memory efficient way.

class Iterator(source)[source]

Bases: object

append(line)[source]
class mindmeld.models.helpers.ModelType[source]

Bases: enum.Enum

An enumeration.

TAGGER_MODEL = 'tagger'
TEXT_MODEL = 'text'
mindmeld.models.helpers.create_annotator(config)[source]

Creates an annotator instance using the provided configuration

Parameters:config (dict) -- A model configuration
Returns:An Annotator class
Return type:Annotator
Raises:ValueError -- When model configuration is invalid or required key is missing
mindmeld.models.helpers.create_embedder_model(app_path, config)[source]

Creates and loads an embedder model

Parameters:config (dict) -- Model settings passed in as a dictionary with 'embedder_type' being a required key
Returns:An instance of appropriate embedder class
Return type:Embedder
Raises:ValueError -- When model configuration is invalid or required key is missing
mindmeld.models.helpers.create_model(config)[source]

Creates a model instance using the provided configuration

Parameters:config (ModelConfig) -- A model configuration
Returns:a configured model
Return type:Model
Raises:ValueError -- When model configuration is invalid
mindmeld.models.helpers.entity_seqs_equal(expected, predicted)[source]

Returns true if the expected entities and predicted entities all match, returns false otherwise. Note that for entity comparison, we compare that the span, text, and type of all the entities match.

Parameters:
  • expected (list of core.Entity) -- A list of the expected entities for some query
  • predicted (list of core.Entity) -- A list of the predicted entities for some query
mindmeld.models.helpers.get_feature_extractor(example_type, name)[source]

Gets a feature extractor given the example type and name

Parameters:
  • example_type (str) -- The type of example
  • name (str) -- The name of the feature extractor
Returns:

A feature extractor wrapper

Return type:

function

mindmeld.models.helpers.get_label_encoder(config)[source]

Gets a label encoder given the label type from the config

Parameters:config (ModelConfig) -- A model configuration
Returns:The appropriate LabelEncoder object for the given config
Return type:LabelEncoder
mindmeld.models.helpers.get_ngram(tokens, start, length)[source]

Gets a ngram from a list of tokens.

Handles out-of-bounds token positions with a special character.

Parameters:
  • tokens (list of str) -- Word tokens.
  • start (int) -- The index of the desired ngram's start position.
  • length (int) -- The length of the n-gram, e.g. 1 for unigram, etc.
Returns:

(str) An n-gram in the input token list.

mindmeld.models.helpers.get_ngrams_upto_n(tokens, n)[source]

This function returns a generator that returns ngram tuples with length upto n

Parameters:
  • tokens (list of str) -- Word tokens.
  • n (int) -- The length of n-gram upto which the ngram tokens are generated
Returns:

ngram, (token index start, token index end)

Return type:

tuple

mindmeld.models.helpers.get_seq_accuracy_scorer()[source]

Returns a scorer that can be used by sklearn's GridSearchCV based on the sequence_accuracy_scoring method below.

mindmeld.models.helpers.get_seq_tag_accuracy_scorer()[source]

Returns a scorer that can be used by sklearn's GridSearchCV based on the sequence_tag_accuracy_scoring method below.

mindmeld.models.helpers.ingest_dynamic_gazetteer(resource, dynamic_resource=None, text_preparation_pipeline=None)[source]

Ingests dynamic gazetteers from the app and adds them to the resource

Parameters:
  • resource (dict) -- The original resource
  • dynamic_resource (dict, optional) -- The dynamic resource that needs to be ingested
  • text_preparation_pipeline (TextPreparationPipeline) -- For text tokenization and normalization
Returns:

A new resource with the ingested dynamic resource

Return type:

(dict)

mindmeld.models.helpers.load_model(path)[source]

Loads a model from a specified path

Parameters:path (str) -- A path where the model configuration is pickled along with other metadata
Returns:
metadata loaded from the path, which contains the configured model in 'model' key
and the model configs in 'model_config' key along with other keys
Return type:dict
Raises:ValueError -- When model configuration is invalid
mindmeld.models.helpers.mask_numerics(token)[source]

Masks digit characters in a token

Parameters:token (str) -- A string
Returns:A masked string for digit characters
Return type:str
mindmeld.models.helpers.merge_gazetteer_resource(resource, dynamic_resource, text_preparation_pipeline)[source]

Returns a new resource that is a merge between the original resource and the dynamic resource passed in for only the gazetteer values

Parameters:
  • resource (dict) -- The original resource built from the app
  • dynamic_resource (dict) -- The dynamic resource passed in
  • text_preparation_pipeline (TextPreparationPipeline) -- For text tokenization and normalization
Returns:

The merged resource

Return type:

dict

mindmeld.models.helpers.np_encoder(val)[source]
mindmeld.models.helpers.register_annotator(annotator_class_name, annotator_class)[source]

Registers an Annotator class for use with create_annotator()

Parameters:
  • annotator_class_name (str) -- The annotator class name as specified in the config
  • model_class (class) -- The annotator class to register
mindmeld.models.helpers.register_augmentor(augmentor_name, augmentor_class)[source]

Registers an Annotator class for use with create_annotator()

Parameters:
  • annotator_class_name (str) -- The annotator class name as specified in the config
  • model_class (class) -- The annotator class to register
mindmeld.models.helpers.register_embedder(embedder_type, embedder)[source]
mindmeld.models.helpers.register_entity_feature(feature_name)[source]

Registers entity feature

Parameters:feature_name (str) -- The name of the entity feature
Returns:the feature extractor
Return type:(func)
mindmeld.models.helpers.register_feature(feature_type, feature_name)[source]

Decorator for adding feature extractor mappings to FEATURE_MAP

Parameters:
  • feature_type -- 'query' or 'entity'
  • feature_name -- The name of the feature, used in config.py
Returns:

the feature extractor

Return type:

(func)

mindmeld.models.helpers.register_label(label_type, label_encoder)[source]

Register a label encoder for use with get_label_encoder()

Parameters:
  • label_type (str) -- The label type of the label encoder
  • label_encoder (LabelEncoder) -- The label encoder class to register
Raises:

ValueError -- If the label type is already registered

mindmeld.models.helpers.register_model(model_type, model_class)[source]

Registers a model for use with create_model()

Parameters:
  • model_type (str) -- The model type as specified in model configs
  • model_class (class) -- The model to register
mindmeld.models.helpers.register_query_feature(feature_name)[source]

Registers query feature

Parameters:feature_name (str) -- The name of the query feature
Returns:the feature extractor
Return type:(func)
mindmeld.models.helpers.requires(resource)[source]

Decorator to enforce the resource dependencies of the active feature extractors

Parameters:resource (str) -- the key of a classifier resource which must be initialized before the given feature extractor is used
Returns:the feature extractor
Return type:(func)
mindmeld.models.helpers.sequence_accuracy_scoring(y_true, y_pred)[source]
Accuracy score which calculates two sequences to be equal only if all of
their predicted tags are equal.
Parameters:
  • y_true (list) -- A sequence of true expected labels
  • y_pred (list) -- A sequence of predicted labels
Returns:

The sequence-level accuracy when comparing the predicted labels against the true expected labels

Return type:

float

mindmeld.models.helpers.sequence_tag_accuracy_scoring(y_true, y_pred)[source]
Accuracy score which calculates the number of tags that were predicted
correctly.
Parameters:
  • y_true (list) -- A sequence of true expected labels
  • y_pred (list) -- A sequence of predicted labels
Returns:

The tag-level accuracy when comparing the predicted labels against the true expected labels

Return type:

float