Different Approaches for Building Conversational Applications

Developers and researchers have been building conversational applications, such as chatbots, for decades. Over the years, many different strategies have been considered. With the recent proliferation and widespread adoption of voice and chat assistants, standard approaches are finally emerging to help define the best practices for building useful, production-quality chatbots. This section outlines a few of the most common approaches for building conversational applications today and describes some of the pros and cons associated with each.

Rule-Based Approaches

Prior to the machine learning advances of the past decade, conversational applications were most commonly built using rule-based approaches. Developers unfamiliar with machine learning usually begin their implementations with rule-based logic. Today, several software frameworks, such as BotKit or Microsoft Bot Framework are available to help developers get simple conversational services up and running. These frameworks provide scaffolding to host message-handling logic and plumbing to integrate with various bot client endpoints. They simplify the task of setting up a server process which listens for incoming text messages, and streamline the effort required to integrate with popular clients like Slack or Facebook Messenger.

With rule-based frameworks, the developer is responsible for implementing the core logic to interpret incoming messages and return helpful responses. This logic generally consists of a series of rules that specify which scripted response to return for a message that matches a specified pattern. Since rule-based frameworks provide no AI capabilities to parse or classify incoming messages, the developer must code all the necessary message processing and interaction logic by hand. Even simple applications commonly require hundreds of rules to handle the different dialogue states in a typical conversational interface.

Rule-based approaches are often the quickest way to build and launch a basic demo of a voice or chat assistant. Moving from demo to production, however, almost always exposes a multitude of corner cases, each of which must be handled with different rules. Conflicts and redundancies between rules further complicate this undertaking. Even for simple applications, the growing list of rules can quickly become prohibitively complex. Seasoned developers, familiar with the pitfalls of rule-based approaches, typically opt for one of the more sophisticated approaches discussed below.

Cloud-Based NLP Services

In the past few years, a variety of cloud-based Natural Language Processing (NLP) services have emerged that aspire to reduce the complexity associated with building basic language understanding capabilities. These services are intended to enable developers without machine learning or NLP expertise to create useful NLP capabilities. All of these services provide browser-based consoles which assist developers in uploading and annotating training examples. They also streamline the task of launching a cloud-based web service to handle and parse natural language requests. These services are generally provided by large consumer internet companies to entice developers to upload their training data and thereby help the service provider improve their own conversational AI offerings in the process. Among the NLP services currently available are Amazon Lex, Google's Dialogflow, Facebook's wit.ai, Microsoft LUIS, and IBM Watson Assistant.

Cloud-based NLP services offer a relatively straightforward path for developers to build conversational applications without requiring machine learning knowledge. As a result, they can be the fastest path for assembling a demo or prototype. Many of these services offer pre-trained models for popular consumer tasks like checking the weather, setting an alarm or timer, updating a to-do list, or sending a text message. This makes them well-suited for applications which simply need to replicate common consumer domains without customization. Since the pre-trained models generally offered by these services duplicate the freely available functionality in today's widely used consumer virtual assistants, companies aiming to leverage these pre-trained models to monetize their own business are likely to face an uphill battle.

For companies that need to build an application which goes beyond a simple demo and requires models other than generic, pre-trained consumer domains, cloud-based NLP services are typically not the best approach. Building language understanding models tailored to a particular application or domain requires training the models on thousands or millions of representative training examples. Cloud-based NLP services, since they are targeted at developers who are unlikely to have large amounts of training data, are generally intended for smaller data sets and simpler custom models. In addition, most cloud-based NLP services only support basic NLP tasks such as intent classification and entity recognition. Implementing the other required processing steps in a typical conversational workflow, such as entity resolution, language parsing, question answering, and knowledge base creation, is left up to the developer. Perhaps most importantly, for companies unable or unwilling to forego legal ownership of user data when it is uploaded to the service provider cloud, these NLP services are generally not a viable option.

Machine Learning Toolkits

Machine learning researchers working on NLP and conversational applications typically rely on versatile and advanced machine learning toolkits. These toolkits provide low-level access to state-of-the-art algorithms, including deep learning models like LSTMs, RNNs, CNNs, and more. Popular machine learning toolkits include Google's TensorFlow and Microsoft Cognitive Toolkit.

For machine learning researchers, toolkits like these are indispensable, and serve as the foundation for much cutting-edge AI research performed today. For companies looking to deploy production conversational services, though, the toolkits often provide little help. First, although they provide access to the most advanced machine learning algorithms, they provide little or no representative training data likely to be useful for a given production application. This leaves developers to do all the heavy lifting associated with creating and managing training data themselves. Secondly, while the algorithm access they provide is low-level and deep, they offer none of the higher level abstractions which can greatly streamline the task of constructing a conversational interface. As a result, even the most skilled machine learning engineers rarely succeed in building production-quality conversational applications using today's machine learning toolkits.

Conversational AI Platforms

With the rise of conversational applications in the past few years, a new technology approach has emerged that is geared toward companies and developers who need to create production-quality conversational experiences. Conversational AI platforms are machine learning platforms optimized for the task of creating conversational applications such as voice or chat assistants. While offering the flexibility and advanced capabilities of traditional machine learning toolkits, they are specifically adapted to streamline the task of building production conversational interfaces. MindMeld is widely recognized as a leading Conversational AI platform.

Conversational AI platforms differ from pure machine learning platforms in that they offer tools expressly designed for the machine learning steps in a typical conversational workflow. For example, tools for intent classification, entity recognition, entity resolution, question answering, and dialogue management are common components in Conversational AI platforms. Unlike cloud-based NLP services, Conversational AI platforms are intended for machine learning engineers at least somewhat familiar with data science best practices. As a result, Conversational AI platforms offer more advanced tools and more flexibility to train and analyze custom language understanding models around large sets of training data. Also unlike cloud-based NLP services, Conversational AI platforms do not require training data to be uploaded to a shared cloud infrastructure. Instead, they provide a flexible and versatile platform which ensures that data sets and trained models are locally managed and always remain the intellectual property of the application developer.

How Good is Good Enough?

With so many different approaches for building conversational applications, it can be difficult for companies to know which strategy is best. An optimal strategy surpasses the threshold of performance that ensures a positive user experience. Determining this baseline level of acceptability can be an especially confusing or daunting undertaking for conversational applications.

Conversational interfaces represent a new user interface paradigm that is unfamiliar and non-intuitive for many developers whose experience is in web or native applications. Conversational interfaces can be utterly unforgiving compared to traditional graphical user interfaces (GUIs). In a traditional GUI, the visual elements provide a mechanism to guide the user down an interaction path that leads to a positive experience. For conversational interfaces, no such visual guide exists. Instead, the user is typically presented with a microphone button or a text prompt and expected to figure out how to verbalize desired requests from scratch. Faced with such an open-ended prompt and little context, many users find themselves at a loss for words. Even worse, they tend to pose questions that the system is not designed to handle, leading to a fruitless and frustrating outcome.

Developers building conversational interfaces for the first time often attempt to follow the same practices they know from building traditional GUIs. That means building a minimum viable product (MVP) to capture a small subset of the envisioned functionality, and then submitting the MVP for user testing. For conversational interfaces, this approach almost inevitably fails. A minimal implementation of a conversational interface is typically built using a small subset of the training data that will eventually be needed in a production application. For example, consider an MVP built using ten percent of the training data eventually required. This application could only understand around ten percent of the typical language variations verbalized by users when they invoke your app. As a result, when you submit your app for user testing, nine out of ten users will fail on their first request. This abysmal performance might quickly toll the death knell for your project.

As it turns out, quick-and-dirty prototypes and limited-scale user testing are not particularly useful in assessing the utility of conversational applications. The only way to measure performance accurately is to enlist large-scale analytics to deterministically measure performance across the long tail of possible user interactions. This measurment methodology is what popular commercial virtual assistants like Siri, Cortana, Google Assistant, and Alexa rely on to ensure that their services meet a mimimum threshold of acceptability before they launch any new features publicly. The methodology requires, first, having a large enough set of 'ground truth' data to reflect the lion's share of all possible user interaction patterns. Secondly, it requires automated testing, using the 'ground truth' data, to ensure that a high enough percentage of user queries return an acceptable response.

Users are unforgiving when evaluating a conversational interface. They expect to verbalize requests just as if speaking with another person. They then expect the system to respond with human-like accuracy. This typically means that conversational applications must be near-perfect. In practice, when a conversational interface cannot achieve accuracy of at least 95%, users are likely conclude that the app is dimwitted and never use it again.

Observe the following guidelines to ensure that your conversational interface accounts for the unique characteristics of conversational applications and meets a minimum threshold of acceptability before going live.

1 Select a use case that mimics a familiar, real-world interaction so that users intuitively know the types of questions to ask. Selecting an unrealistic or incorrect use case will render even the smartest app dead on arrival.
2 Generate a large enough set of 'ground truth' training data to ensure that the vast majority of user interactions can be captured and measured. Dipping your toe in the water does not work. Real-world accuracy can only be evaluated after you take the plunge.
3 Employ large-scale analytics to ensure that your application achieves at least 95% accuracy across the long tail of possible user interactions. Spot checking and small-scale user testing cannot expose long-tail corner cases which might fatally undermine overall accuracy.