mindmeld.models.nn_utils.token_classification module

Custom modules built on top of nn layers that can do token classification

class mindmeld.models.nn_utils.token_classification.BaseTokenClassification[source]

Bases: mindmeld.models.nn_utils.classification.BaseClassification

Base class that defines all the necessary elements to successfully train/infer custom pytorch modules wrapped on top of this base class. Classes derived from this base can be trained for sequence tagging aka. token classification.

forward(batch_data)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

predict(examples)[source]

Returns predicted class labels

Parameters:examples (List[str]) -- The list of examples for which predictions are computed and returned.
predict_proba(examples)[source]

Returns predicted class probabilities

Parameters:examples (List[str]) -- The list of examples for which class prediction probabilities are computed and returned.
classification_type
class mindmeld.models.nn_utils.token_classification.BertForTokenClassification[source]

Bases: mindmeld.models.nn_utils.token_classification.BaseTokenClassification

fit(examples, labels, **params)[source]

Trains the underlying neural model on the inputted data and finally retains the best scored model among all iterations.

Because of possibly large sized neural models, instead of retaining a copy of best set of model weights on RAM, it is advisable to dump them in a temporary folder and upon completing the training process, load the best checkpoint weights.

Parameters:
  • examples (List[str]) -- A list of text strings that will be used for model training and validation
  • labels (Union[List[int], List[List[int]]]) -- A list of labels passed in as integers corresponding to the examples. The encoded labels must have values between 0 and n_classes-1 -- one label per example in case of sequence classification and a sequence of labels per example in case of token classification
class mindmeld.models.nn_utils.token_classification.CharCnnWithWordLstmForTokenClassification[source]

Bases: mindmeld.models.nn_utils.token_classification.BaseTokenClassification

class mindmeld.models.nn_utils.token_classification.CharLstmWithWordLstmForTokenClassification[source]

Bases: mindmeld.models.nn_utils.token_classification.BaseTokenClassification

class mindmeld.models.nn_utils.token_classification.EmbedderForTokenClassification[source]

Bases: mindmeld.models.nn_utils.token_classification.BaseTokenClassification

class mindmeld.models.nn_utils.token_classification.LstmForTokenClassification[source]

Bases: mindmeld.models.nn_utils.token_classification.BaseTokenClassification

A LSTM module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module uses an additional input that determines how the sequence of embeddings obtained after the LSTM layers for each instance in the batch, needs to be split. Once split, the sub-groups of embeddings (each sub-group corresponding to a word or a phrase) can be collapsed to 1D representation per sub-group through pooling operations. Finally, this module outputs a 2D representation for each instance in the batch (i.e. [BS, SEQ_LEN', EMB_DIM]).

mindmeld.models.nn_utils.token_classification.get_token_classifier_cls(classifier_type: str, embedder_type: str = None)[source]