Working with User-Defined Features ================================== In addition to the features available for each NLP classifier in MindMeld, you can also define your own custom feature extractors that are relevant to your application. User-defined features must follow the same format as MindMeld's in-built features. In this section, we will examine the components of a feature extractor function and explain how to write your own custom features. .. _custom_features: Custom Features File -------------------- Start by creating a new Python file, say ``custom_features.py``, that will contain the definitions of all your custom feature extractors. If your MindMeld project was created using the "template" blueprint or adapted from an existing blueprint application, you should already have this file at the root level of your project directory. If you created your MindMeld project from scratch, you can refer to any of the blueprints for an example of the custom features file. In order to use your custom features, the custom features file must be imported in the ``__init__.py`` file. For example, in the Home Assistant blueprint app you can import a custom features file named ``custom_features.py`` by adding the following line to the ``__init__.py`` file. .. code-block:: python import home_assistant.custom_features You can then reference your newly defined features in the classifier configurations you specify in the application configuration file, ``config.py``. The Natural Language Processor uses two kinds of features. **Query features** can be used in domain, intent, and entity model configs, and are extracted by feature extractors that operate on the entire input query. **Entity Features**, on the other hand, can only be used in the role classifier config, and are extracted by feature extractors that operate on a single extracted entity. An example for each kind of feature extractor is provided in the following sections. To summarize, in order to implement and use your own custom features, you must do the following: • Define your feature extractors in a ``.py`` file (referred to as the *custom features file*) • Import the custom features file in ``__init__.py``. • Add your newly defined feature names to the ``'features'`` dictionary within a classifier configuration. Example of a Query Feature Extractor ------------------------------------ Each feature extractor is defined as a Python function that returns an inner ``_extractor`` function. This ``_extractor`` function performs the actual feature extraction. The following code block shows an example of a query feature extractor that computes the average token length of an input query. .. code-block:: python @register_query_feature(feature_name='average-token-length') def extract_average_token_length(**args): """ Example query feature that gets the average length of normalized tokens in the query Returns: (function) A feature extraction function that takes a query and returns the average normalized token length """ def _extractor(query, resources): tokens = query.normalized_tokens average_token_length = sum([len(t) for t in tokens]) / len(tokens) return {'average_token_length': average_token_length} return _extractor Let's take a closer look at the salient parts of a feature extractor. 1. The ``@register_query_feature`` decorator at the top registers the feature with MindMeld. .. code-block:: python @register_query_feature(feature_name='average-token-length') The ``feature_name`` parameter specifies the name by which the extractor will be referenced in the app's configuration file, ``config.py``. The feature name must be added as a key within the 'features' dictionary of the classifier config, as shown below. If the feature extractor function has parameters, the corresponding value in the key-value pair must specify these parameters. If there are no parameters, as in this case, an empty dictionary is sufficient. .. code-block:: python :emphasize-lines: 15 DOMAIN_CLASSIFIER_CONFIG = { ... ... ... 'features': { "bag-of-words": { "lengths": [1, 2] }, "edge-ngrams": {"lengths": [1, 2]}, "in-gaz": {}, "exact": {"scaling": 10}, "gaz-freq": {}, "freq": {"bins": 5}, "average-token-length": {}, } } 2. The arguments passed to the feature extractor can be accessed by the inner ``_extractor`` function. .. code-block:: python def extract_average_token_length(**args): The values of the parameters must be specified in the 'features' dictionary of the classifier config as values corresponding to the appropriate feature keys. 3. The feature extractor returns an ``_extractor`` function which encapsulates the actual feature extraction logic. .. code-block:: python def _extractor(query, resources): Query feature extractors have access to the ``query`` object, which contains the query text, normalized query tokens, and system entity candidates. 4. The ``_extractor`` function must return a dictionary mapping feature names to their corresponding values. .. code-block:: python return {'average_token_length': average_token_length} Example of an Entity Feature Extractor -------------------------------------- Entity features are similar to the query features described above with a few key differences. The most important distinction is that entity features can only be used by the role classifier. Specifying an entity feature in the domain classifier, intent classifier, or entity recognizer config specifications will raise an error. There are two other differences. 1. Entity features are registered using a different decorator, ``@register_entity_feature``. 2. The inner ``_extractor`` function of an entity feature extractor receives an ``example`` object that contains information about the query and the extracted entities. .. code-block:: python def _extractor(example, resources): query, entities, entity_index = example The ``query`` object is the same as above, ``entities`` is a list of all the entities detected in the query, and the ``entity_index`` specifies which of the ``entities`` the extractor function is currently operating on. Here's an example of an entity feature extractor that computes the starting character index for a given entity. .. code-block:: python @register_entity_feature(feature_name='entity-span-start') def extract_entity_span_start(**args): """ Example entity feature that gets the start span for the given entity Returns: (function) A feature extraction function that returns the span start of the entity """ def _extractor(example, resources): query, entities, entity_index = example features = {} current_entity = entities[entity_index] current_entity_token_start = current_entity.token_span.start features['entity_span_start'] = current_entity_token_start return features return _extractor