Recent Changes¶

MindMeld 4.5¶

Warning

This release includes breaking changes. See below for instructions on migrating your apps from MindMeld 4.4 to MindMeld 4.5

1. Support for Deep Learning Models

MindMeld now supports a suite of deep neural architectures for domain and intent classification as well as for entity recognition in its NLP pipeline. This also includes support for Pretrained Transformer Models (PTMs) available as part of the Huggingface models hub. Similar to their shallower counterparts, the deep models also support configurable parameters to alter the model architecture and the training process.

2. Active Learning for Entity Recognition

MindMeld's active learning tool now has support for entity recognition models. This enables teams to automatically select the most informative subsets from large query datasets based on entities present in the queries.

3. Improvements to caching mechanism and tokenization performance

In version 4.5, we have improved the performance for multilingual tokenization and also worked on some modifications and bugfixes to the caching mechanism used for models and queries. Due to these changes, older saved models will no longer load in 4.5. Please make sure to delete the .generated folder in the top level of the application and re-build the application.

MindMeld 4.4¶

MindMeld 4.4 packages several new features that help improve NLP performance of applications with a small amount of data. It continues to add deeper support for multilingual applications by adding East-Asian tokenization support and also adds a new Spanish blueprint.

1. Paraphrasing

Small amounts of training data lead to NLP inaccuracies. Paraphrasing allows application developers to increase the size of their training data up to 10x by augmenting it using paraphrased sentences. This capability is offered in select languages, including English and Spanish. More details can be found here.

2. Automatic Annotation

Annotating entities in training data manually can be very time consuming. Using MindMeld's auto annotator one can efficiently annotate both system entities and custom entities quickly.

3. East Asian Tokenization

Unlike Latin-script languages, words in Japanese, Chinese and Korean (among others) are not separated by spaces. MindMeld has updated its internal processing pipeline to handle such languages.

4. Non-Elasticsearch Entity Resolution and Question Answering

MindMeld has implemented a QA component that does not rely on Elasticsearch for information retrieval. Similarly for Entity Resolution, configurable TFIDF-based and pretrained embedder-based resolvers are implemented in addition to the already available Elasticsearch-based resolution.

5. Spanish Blueprint

MindMeld has a new Spanish blueprint to aid with development for non-English MindMeld applications.

6. Active Learning

Active learning is a modality-independent approach for improving the data acquisition capabilities for all machine learning models. MindMeld's active learning tool empowers teams to automatically select the most informative subsets from large datasets and unannotated user logs from production machine learning systems. Using this tool leads to efficient and accurate models as well as massively reduces the annotation time and costs for new data.

7. Query Caching

MindMeld supports query caching to reduce training time and the memory footprint of the NLP pipeline.

8. MindMeld input validation

MindMeld supports consistent input validation for all of its APIs.

9. DagsHub integration

MindMeld has integrated with DagsHub to support MLOps use-cases like sharing model experimental results, experiments tracking and visualizations. To get started, run the following command for your app: `python python -m <YOUR_APP_PATH> dvc --setup_dagshub `

10. `deny_nlp` functionality for inference

MindMeld provides functionality to block certain NLP components like domains and intents from being inferred by passing the deny_nlp flag to the nlp.process API.

11. Updated Duckling

MindMeld has updated its Duckling dependency to the latest changes made upstream of it.

MindMeld 4.3¶

MindMeld 4.3 provides developers tools to build powerful question-answering systems, integrate with external clients, a new enterprise blueprint, and automatic slot filling. To read more about the latest changes in MindMeld 4.3, check out our announcement on the MindMeld Blog.

MindMeld 4.2¶

MindMeld 4.2 packages several new features to make it easier for developers to build NLP applications for non-English languages, do unstructured QA searches, and a new enterprise blueprint for a human resources (HR) use-case.

1. MindMeld UI

MindMeld UI is a sample web-based chat client interface to interact with any MindMeld application. This web UI also serves as a debugging tool to step through the various stages of query processing by the MindMeld pipeline. See MindMeld UI for more details.

2. Question-answering on unstructured text

MindMeld 4.2 includes a built-in Question-Answering (QA) component using Elasticsearch for unstructured text retrieval. This new feature can be used to perform QA using a knowledge base of passages, frequently asked questions or any long-form text data. This complements the structured text retrieval already supported in MindMeld for knowledge-base search. See dealing with unstructed data for more details.

3. New Human Resources Blueprint

MindMeld 4.2 provide an enterprise Human Resources bot blueprint to complement the existing consumer blueprints we currently support. Refer to HR assistant blueprint for more details.

4. Webex Teams Bot Integration

MindMeld 4.2 includes built-in support for Webex Teams integration, so developers can seamlessly integrate MindMeld bots to Webex Teams. See Webex bot integration for implementation details.

5. Locale and Language codes

MindMeld 4.2 now supports system entity classification and resolution in non-English languages. Please see Specify language and locale codes for more details.

6. Stemming

MindMeld 4.2 supports language stemmers.

7. DialogueFlow.reprocess

MindMeld 4.2 includes an improvement to DialogueFlow (a MindMeld dialogue feature) where the user can exit the current dialogue flow and return to a default flow. Refer to Exiting Dialogue Flow section on how to exit an active dialogue flow.

8. Docker updates

We updated the Getting started with docker page to spawn Elasticsearch within the docker container, which means the developer doesn't have to do any local Elasticsearch setup, thus significantly reducing the local dependencies needed to run MindMeld.

MindMeld 4.1¶

Warning

This release includes breaking changes. See below for instructions on migrating your apps from MindMeld 4.0 to MindMeld 4.1

MindMeld 4.1 allows the package to be open-sourced by complying to the Apache 2.0 license standard.

1. De-coupled Duckling from MindMeld

Duckling, the numerical parser used to detect system entities, is now a configurable option, so an application can disable it if it doesn't need it. See configuring system entities section for more details.

2. Added extensive API documentation for the MindMeld library

The API reference for the MindMeld package can be found here: API Reference.

3. Replaced all instances of the term mmworkbench to mindmeld

All instances of the term mmworkbench in the codebase have been replaced to mindmeld to be consistent with the new open-source package name. Due to this change, older saved models will no longer load in 4.1. Please make sure to delete the .generated folder in the top level of the application and re-build the application.

MindMeld 4.0¶

Warning

This is a major release that includes breaking changes. Refer to the changes numbered 6, 9, and 10 below for instructions on migrating your apps from MindMeld 3 to MindMeld 4.

MindMeld 4 is a major update to the MindMeld conversational AI platform, adding a number of new features to the natural language processor and dialogue manager components. This section provides highlights; see Package History for the full release notes.

1. Robustness to ASR errors

Conversational applications that support voice inputs use an automatic speech recognition (ASR) system to convert the input speech into text and then send the resulting transcript to the MindMeld NLP pipeline. ASRs often make errors, especially on domain-specific vocabulary and proper nouns which can in turn adversely affect the accuracy of the NLP classifiers. MindMeld 4 introduces a couple of new techniques to make the entity processing steps (recognition and resolution) more resilient to ASR errors. Read the new chapter on Dealing with Voice Inputs for more details.

2. Improved recognition of numerical entities

MindMeld 4 uses the actively maintained Duckling library for recognizing numerical entities. The new Haskell-based version is faster and more robust than the deprecated Java-based version that was used in MindMeld 3. There are minor changes to the MindMeld system entity recognizer's parse_numerics() method as a result. See the system entities section.

3. Dynamic gazetteers

Gazetteer-based features have a significant impact on NLP accuracy since they provide a very strong signal to the classification models. This is especially true for entity recognition. In addition to the static gazetteers used by the NLP classifiers at training time, MindMeld 4 introduces the ability to dynamically inject new entries into the gazetteers at runtime to further aid the model in making the right prediction. The section on dynamic gazetteers in the dialogue manager chapter describes when and how to use this new functionality.

4. New features for text classification

MindMeld 4 adds three new feature extractors for the domain and intent classifiers:

The 'word-shape' feature encodes information about the presence of capitalization, numerals, punctuation, etc. in the input query.
The 'sys-candidates' feature indicates the presence of system entities in the query. This feature extractor was only available to the entity recognizer in previous versions.
The 'enable-stemming' feature extracts stemmed versions of the query tokens in addition to the regular bag-of-words features.

Refer to the "Feature Extraction Settings" section of the domain and intent classifier chapters for more details.

5. Support for user-defined features

If the standard set of available features for the various classifiers isn't adequate for your use case, MindMeld now allows you to define your own custom feature extractors and use them with the NLP models. See the new chapter on Working with User-Defined Features.

6. Improvements to model debugging

The predict_proba() method is now available for the entity recognizer and the role classifier as well. The entity recognizer's predict_proba() method outputs a confidence score for each detected entity. The role classifier's predict_proba() method returns a probability distribution across all the possible role labels for a given entity. See the relevant sections in the entity recognizer and role classifier chapters.

While training a new model or investigating classification errors, it is useful to view the features used by the model to make sure they are being extracted correctly. To enable this, each classifier in the MindMeld NLP hierarchy now exposes a view_extracted_features() method that dumps all the features extracted from a given query. See the section titled "Viewing features extracted for classification" for each NLP classifier.

To make MindMeld's model inspection capabilities more user-friendly, the internal representation of all extracted features has been modified to make the output of nlp.inspect() and view_extracted_features() methods easier to comprehend. Due to this change, models trained and saved using MindMeld 3 cannot be loaded in MindMeld 4. You need to train your models afresh on MindMeld 4.

Warning

NLP models trained on MindMeld 3 cannot be loaded by MindMeld 4.

Tip

After installing MindMeld 4, follow these steps to upgrade your old project:

Modify your app's project structure to comply with the newly introduced modular project structure.
Clear all the previously trained models by running python -m APP_NAME clean.
Rebuild all models by running python -m APP_NAME build or running nlp.build() in a Python shell.

7. Dialogue flows

MindMeld 4 introduces a new construct called Dialogue Flow for easily structuring conversation flows where the user needs to be directed towards a specific end goal in a focused manner. See the new Dialogue Flows section in the Dialogue Manager chapter.

8. Asynchronous dialogue state handlers and middleware

To improve the performance and scalability of complex applications that depend on remote services, MindMeld 4 supports asynchronous execution of dialogue state handling logic. Read the section on Asynchronous Dialogue State Handlers and Middleware for more information.

9. New dialogue state handler interface

MindMeld 4 introduces a new dialogue state handler interface that makes an explicit mutability distinction between the data being passed into the dialogue manager from the client and the natural language processor (immutable) and the output data written by the dialogue state handlers and sent back to the client (mutable). This distinction is useful in cases where a single request is handled by multiple dialogue state handlers in sequence, and it's important to keep track of both the original data passed into the dialogue manager and the new data being generated by the dialogue state handling logic. Here is an example of the new interface, where the request object is the immutable data passed into the handler and the responder object is the carrier of the mutable data written to by the handler:

@app.handle(intent='greet')
def welcome(request, responder):
   username = request.context.get('username', 'World')
   responder.reply('Hello ' + username)
   responder.frame['message'] = 'Hello ' + username

See the updated section in the dialogue manager chapter for more details on the request and responder objects.

Warning

The new dialogue state handler interface is incompatible with MindMeld 3 applications.

Tip

Previously, the application used the context and responder objects in its dialogue state handlers, e.g. def welcome(context, responder).

The context object has now been replaced by the immutable request object which cannot be written to. You can only perform write operations on the corresponding properties in the mutable responder object. You should write all your data to the appropriate responder object property instead of the context dictionary.

See the examples in the user guide and the blueprints.

10. New project structure

Previously, MindMeld required all application logic to be in a single file, app.py. As an application grows in complexity, this approach is not scalable. MindMeld 4 allows the application logic to be shared across multiple files. The home assistant blueprint is an example of this modularized approach, where the times_and_dates.py file handles all the logic for the time and date-related functionality.

In the new project structure, we introduce two files: __init__.py where you register all the application files as imports and __main__.py where you register the application command line interface. Read the updated section in the Step-by-Step Guide for more information.

Warning

The new project structure is incompatible with MindMeld 3 applications.

Tip

In the new modular application project structure, we require two files: __init__.py where you register all the application files as imports, and __main__.py where you register the application command line interface. You can still keep all the application logic in a single file (__init__.py); this is how we organize most of our blueprint applications except for Home Assistant.
If the app has all the dialogue state logic in app.py, rename the file to __init__.py. Add a new file called __main__.py, similar to __main__.py in Home Assistant.
To build and run the application, use the commands python -m my_app build and python -m my_app run from outside the application directory.

MindMeld 3.4¶

MindMeld 3.4 brings new functionality to the dialogue manager along with some improvements to the natural language processing pipeline. This section provides highlights; see Package History for the full release notes.

1. Dialogue middleware

MindMeld 3.4 provides a useful mechanism for changing the behavior of many or all dialogue states via middleware. Middleware are developer-defined functions that get called for every request before the matched dialogue state handler. The Dialogue Middleware section describes potential use cases for the middleware functionality and details on how to implement them.

2. Targeted-only and default dialogue state handlers

MindMeld 3.2 introduced the ability to skip NLP classification and pre-select a target dialogue state for the next conversational turn. In 3.4, you can further mark certain dialogue states as targeted_only to exclude them from consideration in regular non-targeted turns.

Additionally, you can now also explicitly denote a dialogue state handler as the default handler without worrying about where it appears in app.py. See the updated Dialogue Manager chapter for more details.

3. Different datasets for different NLP models

It is now possible to specify different sets of labeled query files for training or testing different classifiers in the NLP pipeline. This addresses a big limitation in the earlier versions of MindMeld. For instance, previously, you couldn't add data files under an intent folder and use them only for training the entity recognizer without also affecting the domain or intent models. MindMeld 3.4 gives you the flexibility to do so and hence have a finer control over the behavior of your individual classification models. Read more about the newly added Custom Train/Test Settings in the "Classifier configuration" section for each NLP classifier.

4. Frequency-based thresholding for n-gram features

MindMeld 3.4 allows you to specify a frequency threshold for n-gram feature extractors such as bag-of-words and char-ngrams to prevent rare n-grams from being used as features in your classification model. See Feature Extraction Settings under the "Classifier configuration" section for each NLP classifier.

5. Batch predictions

The MindMeld CLI has been updated with a new predict command that runs NLP predictions on a given set of queries using your app's trained models. The command is useful when you want to run your NLP models in batch on a dataset of queries or bootstrap expected labels in new queries for training. For instance, consider the case where you are preparing additional training data to improve your entity recognizer's performance. It is a lot easier to annotate your new training queries with your existing entity model and then manually correct any errors, than go through every new query and annotate the ground truth entities by hand from scratch.

MindMeld 3.3¶

MindMeld 3.3 contains many useful enhancements aimed at reducing the amount of time it takes to iterate on ML experiments and giving developers a finer-grained control over certain aspects of the application behavior. This section provides highlights; see Package History for the full release notes.

1. New feature types and inspection capabilities for NLP models

In addition to word n-grams, you can now use character n-grams as features for the domain classifier, intent classifier and entity recognizer. Refer to the "Feature Extraction Settings" section of each classifier for more details.

For the domain and intent classifiers, you can also use the newly-introduced feature inspection capability in MindMeld to view the learned feature weights for your trained models. See the section titled "Inspect features and their importance" for each classifier.

2. Improvements to NLP model training

Overriding global configuration: Depending on the characteristics and distribution of your training data across domains and intents, you might want to train a different kind of model for each domain, intent, or entity type in your application. This was not possible previously as you could only specify one global configuration for each classifier type in your NLP pipeline. Refer to the updated section on custom configurations to see how MindMeld 3.3 allows you to override these global settings on a model-by-model basis.

Incremental builds: Till version 3.2, every call to the NaturalLanguageProcessor.build() method kicked off a full build where MindMeld trained/retrained every NLP component from scratch across every domain, intent, and entity type in the project. From version 3.3 onwards, you can do an incremental build where the NaturalLanguageProcessor only trains those subset of models that have been affected by changes to the training data and associated resources. This significantly reduces the time to rebuild the NLP pipeline after small changes to the data. See building models incrementally.

3. Custom datasets

You can now create your own arbitrarily-named custom datasets in addition to the default 'train' and 'test' sets recognized by MindMeld. This allows you to store multiple datasets for your ML experiments and select the relevant dataset for use with each round of training or testing. See select data for experiments.

4. Improved support for dates and times

For applications dealing with temporal events, you can now specify the time zone and timestamp associated with each query to the NaturalLanguageProcessor to ensure accurate prediction of time-based system entities. See specifying request timestamp and time zone.

5. Preprocessor

The preprocessor is a new component that has been added to MindMeld in version 3.3. It allows developers to define any custom preprocessing logic that must be applied on each query before being processed by the NLP pipeline. Read more in the new user guide chapter on Preprocessing.

MindMeld 3.2¶

MindMeld 3.2 brings deep learning models to the MindMeld platform for the first time. This release also improves natural language processing and enhances dialogue management capabilities. This section provides highlights; see Package History for the full release notes.

1. Deep Learning for Entity Recognition (Beta)

You can now opt to train your entity recognizers with a Long Short Term Memory (LSTM) network build in TensorFlow. See Train an entity recognizer.

2. Support for targeted dialogue state handling

The dialogue manager now offers finer-grained control over the dialogue flow logic. You can specify rules that override or bias the output of the NLP classifiers to ensure that you reach a pre-determined dialogue state in the next conversational turn. See Targeted Dialogue State Handling.

3. Improved dialogue state handler interfaces

In version 3.2, the term directives replaces the term client actions found in previous versions. Also, the DialogueResponder class used in dialogue state handlers has been refactored to make its functions more intuitive. See responder.

For existing MindMeld 3.1 apps:

If the app used the responder.prompt() construct, change that to responder.reply() followed by a responder.listen().

If the app used the responder.respond() construct, change that to responder.direct().

4. Easy evaluation interface

The NaturalLanguageProcessor class now has an evaluate() method that runs model evaluation for all the components in the NLP pipeline. The MindMeld CLI has a corresponding evaluate command.

5. Conversational History Management

The history field of the context object used by dialogue state handlers is now maintained by MindMeld. Prior to 3.2, MindMeld assumed that the client would manage the conversational history by appending the necessary information to the history after each turn.

MindMeld 3.1¶

Warning

Upgrading some existing MindMeld 3.0 projects to MindMeld 3.1 will fail unless modified as described below.

MindMeld 3.1 has improved natural language processing and application logic management capabilities, along with enhancements and bug fixes. This section provides highlights; see Package History for the full release notes.

1. Consistent configuration format for NLP classifiers

The classifier configuration formats for the entity recognizer and the role classifier have been updated to be consistent with the domain and intent classifiers. See the relevant sections on entity recognizer training and role classifier training for the new format.

For existing MindMeld 3.0 apps:

If custom classifier configurations for the entity and role models are defined in the application configuration file (config.py), you must manually update those configurations to the 3.1 format.

If the app is based on a MindMeld blueprint, you can use the blueprint command to upgrade to the 3.1 format. Running this command will download the version of the blueprint that is compatible with the latest stable MindMeld release and overwrite your local copy. This means that if you have modified the blueprint, your modifications will be lost, so you should consider saving the modifications outside of your project and manually adding them back in after upgrading.

2. Support for modular dialogue state handling logic

Relative imports of arbitrary modules and packages are now supported within the application container file (app.py). This means that all application logic required for dialogue state handling need not be contained within a single Python file (app.py), as was the case with MindMeld 3.0. Because MindMeld loads each project as a Python package to support this new capability, every project folder must now have an empty __init__.py file at root level.

For existing MindMeld 3.0 apps:

Manually add an empty __init__.py file at the root of your project folder to ensure compatibility with MindMeld 3.1. You can use the blueprint command to overwrite previously-downloaded blueprints with the new 3.1-compatible versions.

To learn more about support for relative imports, see the application container section in Step 4 of the Step-by-Step Guide.

3. CRF for entity recognition

You now have the option of training your entity recognizers using a linear-chain conditional random field (CRF) instead of the default maximum entropy Markov model (MEMM). See entity recognizer training.

4. More models for role classification

You now have the option of training your role classifiers using any of the text models (namely, SVM, Decision Tree, and so on) instead of the default maximum entropy model. See role classifier training.

5. New metrics for entity recognition

Entity recognizer evaluation now exposes new metrics called segment-level errors. These make it easier to interpret and understand the model's sequence tagging performance. See entity recognizer evaluation.