Conversational AI platform for deep-domain voice interfaces and chatbots

Rafael Campos
08/21/2019

AI and NLU are revolutionizing how engineers operate and troubleshoot their network infrastructure.

Monday, 11:09 pm
ABC Bank Inc. Headquarters

An alarm sets off at ABC Bank's Network Operating Center, interrupting the soft hissing of the server fans and making Jack almost choke on his tuna sandwich. His shift had begun just a couple of hours before and he was hoping for a quiet night, maybe just the usual calls from users pulling out all-nighters that had forgotten their passwords.

He glances at one of the large monitors in front of him and frowns at the big red dot signaling that something is not quite right. Instinctively, he asks his assistant: "Hey Webex, ask the network what’s going on". It responds immediately: "There's an issue with your application CorporateERP. Here is a graph of your application's performance over the last 60 minutes." The graph on the screen shows a big spike in the calls-per-minute metric that had occurred just a few minutes before.

Jack thinks hard trying to figure out what could be happening. He's aware of some routing issues that have been causing trouble lately, but he can't tell for sure if that's the root cause. He decides to ask his assistant for help: "Hey Webex, ask the network to do a path trace from the border router to server 1". It responds a couple of seconds later: "Here is the path trace you requested" while a network diagram appears on the screen, showing the path the network traffic takes to reach the application server where the ERP resides. Everything looks normal, except for the fact that there's an unusual delay between two of the routers in the path. This makes Jack believe that maybe the problem has to do with one of those routers.

He decides to escalate the case by opening a ticket and letting one of the level 2 routing engineers take care of the incident. He asks his assistant: "Hey Webex, ask the network to open a support ticket for this issue". It takes just a few seconds for the assistant to answer: "Your ticket number is INC0010093", while showing on the screen the details of the ticket just opened. Relaxed, at last, Jack says: "Hey Webex, thanks for your help!". "Anytime!" responds his assistant. Then, smiling, he takes another big bite from his tuna sandwich and goes on with his work.

Ok, of course, this is yet another futuristic, totally conceptual, too-good-to-be-true use case, way out of the reach of current technology. Right? Well, not really. This actually exists today.

There are several technologies involved in this scenario. Following is a description of each one:

Cisco MindMeld: Open Source Conversational AI Platform. Provides Natural Language Understanding (NLU) capabilities.
Cisco Webex Assistant: A voice-driven virtual assistant. Provides the Automatic Speech Recognition (ASR) functionality and integration with Webex video endpoints.
Cisco DNA Center: The network management system, foundational controller, and analytics platform at the heart of Cisco's intent-based network.
Cisco AppDynamics: An application performance management solution for monitoring, analyzing and optimizing complex environments.
ServiceNow: A cloud-based service management solution that provides the ticketing system for this scenario.

The following diagram illustrates how the different parts integrate to provide the rich experience showcased above:

The various technologies powering the experience

Cisco MindMeld Conversational AI Platform

MindMeld is a Python-based machine learning framework optimized for building advanced conversational assistants with a deep understanding of particular use cases or domains. MindMeld provides an advanced Natural Language Processing engine, capable of domain and intent classification, entity recognition and resolution, dialogue management and advanced question answering.

For the Network Operations Center (NOC) use case, we trained the ML model to work with the following NLP model hierarchy:

NLP hierarchy for the NOC MindMeld application

Let's analyze each layer in the hierarchy:

Domain classification

Because this use case has to do only with NOC-related topics, we are recognizing only one domain, called simply the "noc" domain.

Intent classification

For this specific use case we defined the following three intents:

Application status: Provides information on the status of the applications monitored by AppDynamics. We use the REST API to retrieve live and historical data about the applications' performance in order to determine if there is any degradation in the network and to generate the graphs that are shown to the user.

Some sample queries for this intent would be:
- "Show the application status"
- "What's the status of the application?"
- "What's going on with the network?"
Path tracing: It allows the user to check whether or not there is connectivity between two given endpoints in the network. It uses the DNA Center REST API to instruct the system to perform the different traces and to retrieve the results. The endpoints are recognized as entities by the NLP engine, which provides the user with lots of flexibility to customize the path trace to their needs.

The following queries would be classified under this intent:
- "Do a path trace from the core switch to server 2"
- "Please perform a traceroute from the border router to the second server"
Ticket creation: It leverages ServiceNow's REST API to create a ticket for an incident based on a predefined template. The description of the ticket can be customized depending on the nature of the issue.

These are some of the training phrases used for this intent:
- "Please create a support ticket"
- "I need to open a ticket"

Entity recognition

Note that, in the case of the path-tracing intent, the user has to specify the origin and the destination of the trace in their query. These two pieces of information are known as "entities" and are like variables that have to be resolved to specific values by the NLP engine. Once resolved, the business logic can then perform the necessary treatment to that data to produce the desired output.

In order for the NLP engine to be able to recognize these entities, it has to be trained with a list of possible values for each entity. In our use case, the entity is the device from where the trace should be performed (source device), as well as the one to where the trace should be performed (destination device).

This is the network topology the user is referred to in order to choose the source and destination devices for the trace:

An example network topology

Note that the user could potentially use very different words to specify the devices. That's why the model has to be trained with as many examples as possible of different ways the user might refer to the entities. Here is a list of possible values for the "device" entity that were used to train the model:

router
gateway
border router
edge router
server 1
first server
server 2
second server
server one
server two
core switch
main switch
distribution switch
access switch 1
access switch 2
access switch one
access switch two

Role classification

Continuing with the path-tracing intent, note that we need both the source and the destination devices for the trace. This means that two different instances of the same entity have to be recognized for the query to make sense. To achieve this, the training phrases have to specify not only which words or phrases should be recognized as entities, but the "role" each of these entities play as well.

Following are some sample phrases from this intent:

do a path trace from the {router|device|source} to the {access switch 2|device|destination}
perform a traceroute from the {access switch 1|device|source} to the {server 2|device|destination}
trace route to the {server 2|device|destination} from the {access switch 1|device|source}

Note that the user can potentially specify the source and destination devices in any order, depending on the phrasing chosen, so the training phrases should account for this possibility and include examples of both types.

MindMeld's unique approach

MindMeld is quite different from other NLP frameworks on the market today. First of all, the choice of programming language couldn't have been better. Python is one of the easiest and most powerful programming languages available today, with a huge amount of libraries and documentation out there.

Then there's the extremely straightforward model-specification paradigm. To add a new intent to your model, just create a new folder with the name of the intent and put it in a text file called “train.txt” with the training phrases for that intent, and you're good to go. To create a new entity, add a folder to the "entities" directory with a text file called “gazeteer.txt” containing sample values for your entity.

Another attribute that enhances the framework's flexibility is the fact that you can deploy MindMeld completely on-premise without the need to rely on the cloud for anything. This is especially useful in some security-sensitive environments where data should never leave the customer premises.

The entity recognition, role classification and entity resolution capabilities are extremely easy to use. For the path-trace intent, in just a few lines of code we are able to resolve both the source and destination devices to a canonical name that could be used in regular business logic code:

Dialogue state handler for the ‘do-path-trace' intent

While the user could have specified the device with very open words or phrases such as "border router," "main switch" or "second server"; the "source_device" and "destination_device" variables are guaranteed to hold only the following canonical values: router, core-switch, access-switch-1, access-switch-2, server-1, and server-2. This is achieved by specifying all possible synonyms for each entity value in a file called “mapping.json,” which contains a “whitelist” with the synonyms and their corresponding canonical name “cname”:

Entity mapping file

Rafael Campos is the SVP of Growth Initiatives at Altus Consulting