mindmeld.models.query_features module

This module contains feature extractors for queries

mindmeld.models.query_features.char_ngrams(n, word, **kwargs)[source]

This function extracts character ngrams for the given word

Parameters:
  • n (int) -- Max size of n-gram to extract
  • word (str) -- The word to be extract n-grams from
Returns:

A list of character n-grams for the given word

Return type:

list

mindmeld.models.query_features.enabled_stemming(**kwargs)[source]

Feature extractor for enabling stemming of the query

mindmeld.models.query_features.extract_bag_of_words_features(ngram_lengths_to_start_positions, thresholds=(0, ), **kwargs)[source]

Returns a bag-of-words feature extractor.

Parameters:
  • ngram_lengths_to_start_positions (dict) --
  • thresholds (int) -- Cut off value to include word in n-gram vocab
Returns:

(function) The feature extractor.

mindmeld.models.query_features.extract_char_ngrams(lengths=(1, ), thresholds=(0, ), **kwargs)[source]

Extract character ngrams of specified lengths.

Parameters:
  • lengths (list of int) -- The ngram length.
  • thresholds (list of int) -- frequency cut off value to include ngram in vocab
Returns:

(function) An feature extraction function that takes a query and

returns character ngrams of specified lengths.

mindmeld.models.query_features.extract_char_ngrams_features(ngram_lengths_to_start_positions, thresholds=(0, ), **kwargs)[source]

Returns a character n-gram feature extractor.

Parameters:
  • ngram_lengths_to_start_positions (dict) --
  • window of tokens to be considered relative to the (The) --
  • token while extracting char n-grams (current) --
  • thresholds (int) -- Cut off value to include word in n-gram vocab
Returns:

(function) The feature extractor.

mindmeld.models.query_features.extract_edge_ngrams(lengths=(1, ), **kwargs)[source]

Extract ngrams of some specified lengths.

Parameters:lengths (list of int) -- The ngram length.
Returns:(function) An feature extraction function that takes a query and returns ngrams of the specified lengths at start and end of query.
mindmeld.models.query_features.extract_freq(bins=5, **kwargs)[source]

Extract frequency bin features.

Parameters:bins (int) -- The number of frequency bins (besides OOV)
Returns:A feature extraction function that returns the log of the count of query tokens within each frequency bin.
Return type:(function)
mindmeld.models.query_features.extract_gaz_freq(**kwargs)[source]

Extract frequency bin features for each gazetteer

Returns:A feature extraction function that returns the log of the count of query tokens within each gazetteer's frequency bins.
Return type:(function)
mindmeld.models.query_features.extract_in_gaz_feature(scaling=1, **kwargs)[source]

Returns a feature extractor that generates a set of features indicating the presence of query n-grams in different entity gazetteers. Used by the domain and intent classifiers when the 'in-gaz' feature is specified in the config.

Parameters:
  • scaling (int) -- A multiplicative scale factor to the ratio_pop and ratio features of
  • in-gaz feature set. (the) --
Returns:

Returns an extractor function

Return type:

function

mindmeld.models.query_features.extract_in_gaz_ngram_features(**kwargs)[source]

Returns a feature extractor for surrounding ngrams in gazetteers

mindmeld.models.query_features.extract_in_gaz_span_features(**kwargs)[source]

Returns a feature extractor for properties of spans in gazetteers

mindmeld.models.query_features.extract_length(**kwargs)[source]

Extract length measures (tokens and chars; linear and log) on whole query.

Returns:(function) A feature extraction function that takes a query and returns number of tokens and characters on linear and log scales
mindmeld.models.query_features.extract_ngrams(lengths=(1, ), thresholds=(0, ), **kwargs)[source]

Extract ngrams of some specified lengths.

Parameters:
  • lengths (list of int) -- The ngram length.
  • thresholds (list of int) -- frequency cut off value to include ngram in vocab
Returns:

(function) An feature extraction function that takes a query and returns ngrams of the specified lengths.

mindmeld.models.query_features.extract_query_string(scaling=1000, **kwargs)[source]

Extract whole query string as a feature.

Returns:(function) A feature extraction function that takes a query and returns the whole query string for exact matching
mindmeld.models.query_features.extract_sentiment(analyzer='composite', **kwargs)[source]

Generates sentiment intensity scores for each query

Returns:(function) A feature extraction function that takes in a query and returns sentiment values across positive, negative and neutral
mindmeld.models.query_features.extract_sys_candidate_features(start_positions=(0, ), **kwargs)[source]

Return an extractor for features based on a heuristic guess of numeric candidates at/near the current token.

Parameters:start_positions (tuple) -- positions relative to current token (=0)
Returns:(function) The feature extractor.
mindmeld.models.query_features.extract_sys_candidates(entities=None, **kwargs)[source]

Return an extractor for features based on a heuristic guess of numeric candidates in the current query.

Returns:(function) The feature extractor.
mindmeld.models.query_features.extract_word_shape(lengths=(1, ), **kwargs)[source]

Extracts word shape for ngrams of specified lengths.

Parameters:lengths (list of int) -- The ngram length
Returns:(function) An feature extraction function that takes a query and returns ngrams of word shapes, for n of specified lengths.
mindmeld.models.query_features.find_ngrams(input_list, n, **kwargs)[source]

Generates all n-gram combinations from a list of strings

Parameters:
  • input_list (list) -- List of string to n-gramize
  • n (int) -- The size of the n-gram
Returns:

A list of ngrams across all the strings in the input list

Return type:

list

mindmeld.models.query_features.update_features_sequence(feat_seq, update_feat_seq, **kwargs)[source]

Update a list of features with another parallel list of features.

Parameters:
  • feat_seq (list of dict) -- The original list of feature dicts which gets mutated.
  • update_feat_seq (list of dict) -- The list of features to update with.