mindmeld.models.nn_utils package

class mindmeld.models.nn_utils.EmbedderForSequenceClassification[source]

Bases: mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification

An embedder pooling module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).

The forward method of this module expects padded token ids along with numer of tokens per instance in the batch.

Additionally, one can set different coefficients for different tokens of the embedding matrix (e.g. tf-idf weights).

class mindmeld.models.nn_utils.CnnForSequenceClassification[source]

Bases: mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification

A CNN module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).

The forward method of this module expects only padded token ids as input.

class mindmeld.models.nn_utils.LstmForSequenceClassification[source]

Bases: mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification

A LSTM module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).

The forward method of this module expects padded token ids along with numer of tokens per instance in the batch.

class mindmeld.models.nn_utils.BertForSequenceClassification[source]

Bases: mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification

fit(examples, labels, **params)[source]

Trains the underlying neural model on the inputted data and finally retains the best scored model among all iterations.

Because of possibly large sized neural models, instead of retaining a copy of best set of model weights on RAM, it is advisable to dump them in a temporary folder and upon completing the training process, load the best checkpoint weights.

Parameters:
  • examples (List[str]) -- A list of text strings that will be used for model training and validation
  • labels (Union[List[int], List[List[int]]]) -- A list of labels passed in as integers corresponding to the examples. The encoded labels must have values between 0 and n_classes-1 -- one label per example in case of sequence classification and a sequence of labels per example in case of token classification
class mindmeld.models.nn_utils.EmbedderForTokenClassification[source]

Bases: mindmeld.models.nn_utils.token_classification.BaseTokenClassification

class mindmeld.models.nn_utils.LstmForTokenClassification[source]

Bases: mindmeld.models.nn_utils.token_classification.BaseTokenClassification

A LSTM module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module uses an additional input that determines how the sequence of embeddings obtained after the LSTM layers for each instance in the batch, needs to be split. Once split, the sub-groups of embeddings (each sub-group corresponding to a word or a phrase) can be collapsed to 1D representation per sub-group through pooling operations. Finally, this module outputs a 2D representation for each instance in the batch (i.e. [BS, SEQ_LEN', EMB_DIM]).

class mindmeld.models.nn_utils.CharCnnWithWordLstmForTokenClassification[source]

Bases: mindmeld.models.nn_utils.token_classification.BaseTokenClassification

class mindmeld.models.nn_utils.CharLstmWithWordLstmForTokenClassification[source]

Bases: mindmeld.models.nn_utils.token_classification.BaseTokenClassification

class mindmeld.models.nn_utils.BertForTokenClassification[source]

Bases: mindmeld.models.nn_utils.token_classification.BaseTokenClassification

fit(examples, labels, **params)[source]

Trains the underlying neural model on the inputted data and finally retains the best scored model among all iterations.

Because of possibly large sized neural models, instead of retaining a copy of best set of model weights on RAM, it is advisable to dump them in a temporary folder and upon completing the training process, load the best checkpoint weights.

Parameters:
  • examples (List[str]) -- A list of text strings that will be used for model training and validation
  • labels (Union[List[int], List[List[int]]]) -- A list of labels passed in as integers corresponding to the examples. The encoded labels must have values between 0 and n_classes-1 -- one label per example in case of sequence classification and a sequence of labels per example in case of token classification
class mindmeld.models.nn_utils.TokenizerType[source]

Bases: enum.Enum

An enumeration.

BPE_TOKENIZER = 'bpe-tokenizer'
CHAR_TOKENIZER = 'char-tokenizer'
HUGGINGFACE_PRETRAINED_TOKENIZER = 'huggingface_pretrained-tokenizer'
WHITESPACE_AND_CHAR_DUAL_TOKENIZER = 'whitespace_and_char-tokenizer'
WHITESPACE_TOKENIZER = 'whitespace-tokenizer'
WORDPIECE_TOKENIZER = 'wordpiece-tokenizer'
class mindmeld.models.nn_utils.EmbedderType[source]

Bases: enum.Enum

An enumeration.

BERT = 'bert'
GLOVE = 'glove'
NONE = None
class mindmeld.models.nn_utils.SequenceClassificationType[source]

Bases: enum.Enum

An enumeration.

CNN = 'cnn'
EMBEDDER = 'embedder'
LSTM = 'lstm'
class mindmeld.models.nn_utils.TokenClassificationType[source]

Bases: enum.Enum

An enumeration.

CNN_LSTM = 'cnn-lstm'
EMBEDDER = 'embedder'
LSTM = 'lstm-pytorch'
LSTM_LSTM = 'lstm-lstm'
mindmeld.models.nn_utils.get_sequence_classifier_cls(classifier_type: str, embedder_type: str = None)[source]
mindmeld.models.nn_utils.get_token_classifier_cls(classifier_type: str, embedder_type: str = None)[source]