mindmeld.resource_loader module

This module contains the processor resource loader.

class mindmeld.resource_loader.Hasher(algorithm='sha1')[source]

Bases: object

An thin wrapper around hashlib. Uses cache for commonly hashed strings.

algorithm

str -- The hashing algorithm to use. Defaults to 'sha1'. See hashlib.algorithms_available for a list of options.

hash(string)[source]

Hashes a string.

Parameters:string (str) -- The string to hash
Returns:The hash result
Return type:str
hash_file(filename)[source]

Creates a hash of the file. If the file does not exist, use the empty string instead and return the resulting hash digest.

Parameters:filename (str) -- The path of a file to hash.
Returns:A hex digest of the file hash
Return type:str
hash_list(strings)[source]

Hashes a list of strings.

Parameters:strings (list[str]) -- The strings to hash
Returns:The hash result
Return type:str
algorithm

Getter for algorithm property.

Returns:the hashing algorithm
Return type:str
class mindmeld.resource_loader.ProcessedQueryList(cache=None, elements=None)[source]

Bases: object

ProcessedQueryList provides a memory efficient disk backed list representation for a list of queries.

class DomainIterator(source)[source]

Bases: mindmeld.resource_loader.Iterator

class EntitiesIterator(source, cached=False)[source]

Bases: mindmeld.resource_loader.Iterator

class IntentIterator(source)[source]

Bases: mindmeld.resource_loader.Iterator

class Iterator(source, cached=False)[source]

Bases: object

reorder(indices)[source]
class ListIterator(elements)[source]

Bases: mindmeld.resource_loader.Iterator

ListIterator is a wrapper around a in memory list that supports the same functionality as a ProcessedQueryList.Iterator. This allows building of arbitrary lists of data and presenting them as a ProcessedQueryList.Iterator to functions that require them.

class MemoryCache(queries)[source]

Bases: object

A class to provide cache functionality for in-memory lists of ProcessedQuery objects

get(row_id)[source]
get_domain(row_id)[source]
get_entities(row_id)[source]
get_intent(row_id)[source]
get_query(row_id)[source]
get_raw_query(row_id)[source]
class QueryIterator(source, cached=False)[source]

Bases: mindmeld.resource_loader.Iterator

class RawQueryIterator(source, cached=False)[source]

Bases: mindmeld.resource_loader.Iterator

append(query_id)[source]
domains()[source]
entities()[source]
extend(query_ids)[source]
static from_in_memory_list(queries)[source]

Creates a ProcessedQueryList wrapper around an in-memory list of ProcessedQuery objects

Parameters:queries (list(ProcessedQuery)) -- queries to wrap
Returns:ProcessedQueryList object
intents()[source]
processed_queries()[source]
queries()[source]
raw_queries()[source]
cache
class mindmeld.resource_loader.ResourceLoader(app_path, query_factory, query_cache=None)[source]

Bases: object

ResourceLoader objects are responsible for loading resources necessary for nlp components (classifiers, entity recognizer, parsers, etc).

Note: we need to keep resource helpers as instance methods, as load_feature_resource assumes all helpers to be instance methods.

class CharNgramFreqBuilder(lengths, thresholds)[source]

Bases: object

Compiles n-gram character frequency dictionary of normalized query tokens

add(query)[source]
get_resource()[source]
class QueryFreqBuilder(enable_stemming=False)[source]

Bases: object

Compiles frequency dictionary of normalized and stemmed query strings

add(query)[source]
get_resource()[source]
class WordFreqBuilder(enable_stemming=False)[source]

Bases: object

Compiles unigram frequency dictionary of normalized query tokens

add(query)[source]
get_resource()[source]
class WordNgramFreqBuilder(lengths, thresholds, enable_stemming=False)[source]

Bases: object

Compiles n-gram frequency dictionary of normalized query tokens

add(query)[source]
get_resource()[source]
build_gazetteer(gaz_name, exclude_ngrams=False, force_reload=False)[source]

Builds the specified gazetteer using the entity data and mapping files.

Parameters:
  • gaz_name (str) -- The name of the entity the gazetteer corresponds to
  • exclude_ngrams (bool, optional) -- Whether partial matches of entities should be included in the gazetteer
  • force_reload (bool, optional) -- Whether file should be forcefully reloaded from disk
static create_resource_loader(app_path, query_factory=None, text_preparation_pipeline=None)[source]

Creates the resource loader for the app at app path.

Parameters:
  • app_path (str) -- The path to the directory containing the app's data
  • query_factory (QueryFactory) -- The app's query factory
  • text_preparation_pipeline (TextPreparationPipeline) -- The app's text preparation pipeline.
Returns:

a resource loader

Return type:

ResourceLoader

filter_file_paths(compiled_pattern, file_paths=None)[source]

Get a list of file paths that match a specific file_pattern

Parameters:
  • compiled_pattern (sre.SRE_Pattern) -- A compiled regex pattern to filter with.
  • file_paths (list) -- A list of file paths.
Returns:

A list of file paths.

Return type:

list

static flatten_query_tree(query_tree)[source]

Takes a query tree and returns the elements in list form.

Parameters:query_tree (dict) -- A nested dictionary that organizes queries by domain then intent.
Returns:A list of Query objects.
Return type:list
get_all_file_paths(file_pattern='.*.txt')[source]

Get a list of text file paths across all intents.

Returns:A list of all file paths.
Return type:list
get_entity_map(entity_type, force_reload=False)[source]

Creates a mapping file for a given entity.

Parameters:entity_type (str) -- The name of the entity
get_flattened_label_set(domain=None, intent=None, label_set=None, force_reload=False)[source]
get_gazetteer(gaz_name, force_reload=False)[source]

Gets a gazetteers by name.

Parameters:gaz_name (str) -- The name of the entity the gazetteer corresponds to
Returns:Gazetteer data
Return type:dict
get_gazetteer_hash(gaz_name)[source]

Gets the hash of a gazetteer by entity name.

Parameters:gaz_name (str) -- The name of the entity the gazetteer corresponds to
Returns:Hash of a gazetteer specified by name.
Return type:str
get_gazetteers(force_reload=False)[source]

Gets gazetteers for all entities.

Returns:Gazetteer data keyed by entity type
Return type:dict
get_gazetteers_hash()[source]

Gets a single hash of all the gazetteer ordered by alphabetical entity type.

Returns:Hash of a list of gazetteer hashes.
Return type:str
get_labeled_queries(domain=None, intent=None, label_set=None, force_reload=False)[source]

Gets labeled queries from the cache, or loads them from disk.

Parameters:
  • domain (str) -- The domain of queries to load
  • intent (str) -- The intent of queries to load
  • force_reload (bool) -- Will not load queries from the cache when True
Returns:

ProcessedQuery objects (or strings) loaded from labeled query files, organized by

domain and intent.

Return type:

dict

static get_sentiment_analyzer()[source]

Returns a sentiment analyzer and downloads the necessary data libraries required from nltk

get_sys_entity_types(labels)[source]

Get all system entity types from the entity labels.

Parameters:labels (list of QueryEntity) -- a list of labeled entities
get_text_preparation_pipeline()[source]

Get the tokenizer from the query_factory attribute

Returns:
Class responsible for
the normalization and tokenization of text.
Return type:text_preparation_pipeline (TextPreparationPipeline)
hash_feature_resource(name)[source]

Hashes the named resource.

Parameters:name (str) -- The name of the resource to hash
Returns:The hash result
Return type:str
hash_list(items)[source]

Hashes the list of items.

Parameters:items (list[str]) -- A list of strings to hash
Returns:The hash result
Return type:str
hash_string(string)[source]

Hashes a string.

Parameters:string (str) -- The string to hash
Returns:The hash result
Return type:str
load_entity_map(entity_type)[source]

Loads an entity mapping file.

Parameters:entity_type (str) -- The name of the entity
load_gazetteer(gaz_name)[source]

Loads a gazetteer specified by the entity name.

Parameters:gaz_name (str) -- The name of the entity the gazetteer corresponds to
load_query_file(domain, intent, file_path)[source]

Loads the queries from the specified file.

Parameters:
  • domain (str) -- The domain of the query file
  • intent (str) -- The intent of the query file
  • file_path (str) -- The name of the query file
RSC_HASH_MAP = {'c_ngram_freq': <function ResourceLoader.<lambda> at 0x13121d310>, 'enable-stemming': <function ResourceLoader.<lambda> at 0x13121d4c0>, 'gazetteers': <function ResourceLoader.get_gazetteers_hash at 0x13121ad30>, 'q_freq': <function ResourceLoader.<lambda> at 0x13121d3a0>, 'sys_types': <function ResourceLoader.<lambda> at 0x13121d430>, 'vader_classifier': <function ResourceLoader.<lambda> at 0x13121d550>, 'w_freq': <function ResourceLoader.<lambda> at 0x13121d1f0>, 'w_ngram_freq': <function ResourceLoader.<lambda> at 0x13121d280>}
hash_to_model_path

dict -- A dictionary that maps hashes to the file path of the classifier.

query_cache

Lazy load the query cache since it's not required for inference.