mindmeld.models.nn_utils.classification module

Base for custom modules that are developed on top of nn layers that can do sequence or token classification

class mindmeld.models.nn_utils.classification.BaseClassification[source]

Bases: torch.nn.modules.module.Module

A base class for sequence & token classification using deep neural nets. Both the classification
submodules have a common fit() method defined in this base class, which also drives the training of pytorch based deep nets. The net’s computational graph is defined only when the fit() method is called. This base class also holds few common utility methods and further defines the skeleton of the children classes through abstract methods.
dump(path: str)[source]

Dumps underlying torch.nn model, encoder state and params

Parameters:path (str) – The path header for where the files are dumped.
The following states are dumped into different files:
  • Pytorch model weights
  • Encoder state
  • Params (including params such as tokenizer_type and emb_dim that are used during
    loading to create encoder and forward graph)
fit(examples: List[str], labels: Union[List[int], List[List[int]]], **params)[source]

Trains the underlying neural model on the inputted data and finally retains the best scored model among all iterations.

Because of possibly large sized neural models, instead of retaining a copy of best set of model weights on RAM, it is advisable to dump them in a temporary folder and upon completing the training process, load the best checkpoint weights.

Parameters:
  • examples (List[str]) – A list of text strings that will be used for model training and validation
  • labels (Union[List[int], List[List[int]]]) – A list of labels passed in as integers corresponding to the examples. The encoded labels must have values between 0 and n_classes-1 – one label per example in case of sequence classification and a sequence of labels per example in case of token classification
forward(batch_data: mindmeld.models.nn_utils.helpers.BatchData) → mindmeld.models.nn_utils.helpers.BatchData[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_default_params() → Dict[source]
classmethod load(path: str)[source]

Loads states from a dumped path

Parameters:path (str) – The path header wherein dumped files are present.
log_and_return_model_info(verbose: bool = False) → str[source]

Logs and returns the details of the underlying torch.nn model, such as occupying disk space when dumped, number of parameters, the device on which the model is placed, etc.

Parameters:verbose (bool) – Determines the amount of information to be logged and returned.
predict(examples: List[str]) → Union[List[int], List[List[int]]][source]

Returns predicted class labels

Parameters:examples (List[str]) – The list of examples for which predictions are computed and returned.
predict_proba(examples: List[str]) → Union[List[List[int]], List[List[List[int]]]][source]

Returns predicted class probabilities

Parameters:examples (List[str]) – The list of examples for which class prediction probabilities are computed and returned.
to_device(batch_data: mindmeld.models.nn_utils.helpers.BatchData) → mindmeld.models.nn_utils.helpers.BatchData[source]

Places pytorch tensors on the device configured through the params

Parameters:batch_data (BatchData) – A BatchData object consisting different tensor objects
classification_type