mindmeld.models.containers module

class mindmeld.models.containers.GloVeEmbeddingsContainer(token_dimension=300, token_pretrained_embedding_filepath=None)[source]

Bases: object

This class is responsible for the downloading, extraction and storing of word embeddings based on the GloVe format.

To facilitate not loading the large glove embedding file to memory everytime a new container is created, a class-level attribute with a hashmap is created.

TODO: refactor the call-signature similar to other containers by accepting pretrained_path_or_name instead of token dimension and filepath. Also deprecate these two arguments.

get_pretrained_word_to_embeddings_dict()[source]

Returns the word to embedding dict.

Returns:word to embedding mapping.
Return type:(dict)
ALLOWED_WORD_EMBEDDING_DIMENSIONS = [50, 100, 200, 300]
CONTAINER_LOOKUP = {}
EMBEDDING_FILE_PATH_TEMPLATE = 'glove.6B.{}d.txt'
class mindmeld.models.containers.HuggingfaceTransformersContainer(pretrained_model_name_or_path, quantize_model=False, cache_lookup=True, from_configs=False)[source]

Bases: object

This class is responsible for the downloading and extraction of transformers models such as
BERT, Multilingual-BERT, etc. based on the https://github.com/huggingface/transformers format.

To facilitate not loading the large glove embedding file to memory everytime a new container is created, a class-level attribute with a hashmap is created.

get_model_bunch()[source]
get_transformer_model()[source]
get_transformer_model_config()[source]
get_transformer_model_tokenizer()[source]
CONTAINER_LOOKUP = {}
class mindmeld.models.containers.SentenceTransformersContainer(pretrained_name_or_abspath, bert_output_type='mean', quantize_model=False)[source]

Bases: object

This class is responsible for the downloading and extraction of sentence transformers models based on the https://github.com/UKPLab/sentence-transformers format.

To facilitate not loading the large glove embedding file to memory everytime a new container is created, a class-level attribute with a hashmap is created.

get_model_bunch()[source]
get_pooling_model()[source]
get_sbert_model()[source]
get_transformer_model()[source]
CONTAINER_LOOKUP = {}
class mindmeld.models.containers.TqdmUpTo(iterable=None, desc=None, total=None, leave=True, file=None, ncols=None, mininterval=0.1, maxinterval=10.0, miniters=None, ascii=None, disable=False, unit='it', unit_scale=False, dynamic_ncols=False, smoothing=0.3, bar_format=None, initial=0, position=None, postfix=None, unit_divisor=1000, write_bytes=None, lock_args=None, nrows=None, colour=None, delay=0, gui=False, **kwargs)[source]

Bases: tqdm.std.tqdm

Provides update_to(n) which uses tqdm.update(delta_n).

update_to(b=1, bsize=1, tsize=None)[source]

Reports update statistics on the download progress.

Parameters:
  • b (int) -- Number of blocks transferred so far [default: 1].
  • bsize (int) -- Size of each block (in tqdm units) [default: 1].
  • tsize (int) -- Total size (in tqdm units). If [default: None] remains unchanged.