mindmeld.models.nn_utils.sequence_classification module

Custom modules built on top of nn layers that can do sequence classification

class mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification[source]

Bases: mindmeld.models.nn_utils.classification.BaseClassification

Base class that defines all the necessary elements to successfully train/infer custom pytorch modules wrapped on top of this base class. Classes derived from this base can be trained for sequence classification.

forward(batch_data)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

predict(examples)[source]

Returns predicted class labels

Parameters:examples (List[str]) -- The list of examples for which predictions are computed and returned.
predict_proba(examples)[source]

Returns predicted class probabilities

Parameters:examples (List[str]) -- The list of examples for which class prediction probabilities are computed and returned.
classification_type
class mindmeld.models.nn_utils.sequence_classification.BertForSequenceClassification[source]

Bases: mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification

fit(examples, labels, **params)[source]

Trains the underlying neural model on the inputted data and finally retains the best scored model among all iterations.

Because of possibly large sized neural models, instead of retaining a copy of best set of model weights on RAM, it is advisable to dump them in a temporary folder and upon completing the training process, load the best checkpoint weights.

Parameters:
  • examples (List[str]) -- A list of text strings that will be used for model training and validation
  • labels (Union[List[int], List[List[int]]]) -- A list of labels passed in as integers corresponding to the examples. The encoded labels must have values between 0 and n_classes-1 -- one label per example in case of sequence classification and a sequence of labels per example in case of token classification
class mindmeld.models.nn_utils.sequence_classification.CnnForSequenceClassification[source]

Bases: mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification

A CNN module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).

The forward method of this module expects only padded token ids as input.

class mindmeld.models.nn_utils.sequence_classification.EmbedderForSequenceClassification[source]

Bases: mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification

An embedder pooling module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).

The forward method of this module expects padded token ids along with numer of tokens per instance in the batch.

Additionally, one can set different coefficients for different tokens of the embedding matrix (e.g. tf-idf weights).

class mindmeld.models.nn_utils.sequence_classification.LstmForSequenceClassification[source]

Bases: mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification

A LSTM module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).

The forward method of this module expects padded token ids along with numer of tokens per instance in the batch.

mindmeld.models.nn_utils.sequence_classification.get_sequence_classifier_cls(classifier_type: str, embedder_type: str = None)[source]