Designing a new AI-based chat bot
A chatbot is a digital employee that can answer customers’ questions autonomously. Developing a chatbot to help Ericsson service engineers get 24/7 replies to their queries was the final semester project of Sudipta Bose, a Computer Science student from the Indian Institute of Technology, IIT (ISM), Dhanbad, India, during her internship at Ericsson Research in Chennai. In this blog post, Sudipta explains the basics of chatbots and the methods and technologies she used to build one.
The project was assigned to me on the basis of my profile as I had earlier knowledge on the stemming of nouns (removing inflectional forms) in one of the Indian languages (Bengali), and an interest in Natural Language Processing (NLP) and Machine Learning algorithms.
Chatbot applications in different domains are shown below.
There are only two categories of chatbots: Rule-based chatbots and Artificial Intelligence (AI) based chatbots. Rule-based chatbots rely on a databank of rules and heuristics. These chatbots are most suited for industries like travel agents, marketing industries etc. The generation of rules for rule-based chatbots requires an in-depth understanding of the domain, and also involves considerable human effort. On the other hand, AI-based chatbots empower AI algorithms to understand the context of user queries and generate answers to suit them. Hence, we have developed a new AI-based chatbot to answer service engineers’ queries regarding internal Ericsson products.
Artificial Intelligence is any intelligence demonstrated by a machine that leads it to an optimal or suboptimal solution for a given problem. Machine learning is a subset of AI, which make the machine learn from the data. Deep Learning is a subfield of Machine Learning concerned with algorithms inspired by the structure and function of the brain. In deep learning, there are multiple layers to process features, and generally, each layer extracts some piece of valuable information. Natural language processing is a component of Artificial Intelligence which makes the computer understand and derive meaning from human language for processing the information.
A chat application design defines the interaction between the user and the machine. The chatbot designer will consider an individual’s needs, the questions that will be prompted back to users for clarification, and the overall interaction.
Chat Application Development
For the creation of a new chat application, we received only a small dataset of conversations between service engineers and the development team on a specific product, the relevant product documentation and Frequently Asked Question (FAQs). As discussed, to understand and classify the context of a query, we tried three different models:
- a basic machine learning model, Support Vector Machine (SVM);
- a deep learning model, Convolutional Neural Network (CNN)
- a graphical model, Conditional Random Field (CRF).
To train the above models with a given dataset, each document is first pre-processed using NLP techniques such as stop word removal techniques, stemming, document term frequency matrix calculation, etc. As the size of the training dataset is small, the models built resulted in poor accuracy. Hence, we chose to go with a model-free approach to solve this problem. Our approach requires three phases to complete the creation of new chat applications:
- the training phase,
- the testing phase
- the evaluation phase.
In training, we used the available dataset and relevant product documents to extracts features. In testing, we explored the trained features and user input query to produce responses and we evaluated the responses through the gold standard approach.
The dataset contains queries posed by customers and answers provided by service engineers. We performed the following operations on the queries. First, we extracted the unique words from all sentences and found out the position of these words in each sentence. The position of the word in each sentence plays a key role: For example – sentence 1: ‘Alex has only ten minutes to complete the job’. If the position of the word ‘only’ changes, then the meaning of the sentence also changes to e.g. ‘Only Alex has ten minutes to complete the job’. Hence, we chose to use the position in the sentence as an important feature. Using the position of the words in each sentence as matrix rows and unique words as columns, we formed a matrix that describes the information on the number of times a particular word is present in a particular position.
The next step is to assign a random number of 10 dimensions for each unique word in the data. Then we multiply the random number obtained for each word with the corresponding word position and further sum up the values to obtain a 10-dimension vector for each sentence in the dataset. Finally, we normalize the value obtained to standardize the values between 0 to 1. This results in a ten-dimensional vector for each query available in the database.
In addition to handling the position of a word present in a sentence, we also considered one more challenge in processing. If a customer poses a query that is not present in the dataset, the proposed application may not be able to generate a good answer. To address this, we considered the keywords present in the product documents and the other supporting documents. First, we pre-process all the data by removing the stop words, punctuation etc. as with any traditional NLP exercise. We performed part-of-speech (POS) tagging on the dataset. In the next step, we collect instances where these words are present in the system and create a knowledge graph by connecting the nouns in the system. The knowledge graph charts the relation between the keywords in the form of the entire sentence. For example, consider ‘Chris plays football with Julie’. In this sentence, after removing the stop words, we get ‘Chris plays football Julie’. Then, we address the words which are categorized as verbs and nouns in the sentence. Finally, we collect the nouns in the sentence as keywords. In this sentence, the keywords are ‘Chris’, ’football’, ’Julie’.
From the knowledge graph constructed, it’s possible to answer queries such as ‘Chris plays with whom?’, ‘Chris played what?’. In this way, we can create a relation between any number of keywords (that are identified as nouns). It should be noted that the example given is a single sentence and hence the knowledge graph created also represents a single edge. However, in the case of multiple occurrences of the same keywords in different sentences, the knowledge graph consists of multiple edges and each edge represents the context of the conversation. Hence, in this work we use the classification model to understand the context of the query, and by applying the CNN model to explore the sentences, to understand the context and perform classification. A general approach for modelling is to label each sentence in a given corpus. However, this is difficult as it requires so much manual work. We therefore resort to another way of performing classification without manual labeling. We split the query into different words and search for the keywords present in the query. Once the key words are identified, we try to find any additional words in the query that will help us in context building. Next, the query is labeled in a context (based on the words present in the query and documentation) and used to train the model. Then, we use the recurrently rebuilt model to identify the context of the query to answer the user’s queries.
At the end of this training exercise process, we obtain the 10-dimension vector for each query and corresponding answers to the query. In addition, we understand the contextual relationship between the keywords present in the sentences.
Testing a proposed chat application involves these steps: First, the customer feeds a query in the chat application user interface; next, the query is processed as follows:
Logical operations relevance model: In our experience in designing a new chat application, we understood the importance of the presence of conjunctions in queries. For example, if the user asks, ‘What is the temperature in Chennai and Mumbai?’. The ‘and’ conjunction connects two different questions and should result in two different answers. Hence, in building our chat application, we split the questions into two parts and gave two separate answers to the user.
Keyword based probabilistic model: To handle a case of the user query similar to the one in the dataset but with a different vocabulary, we used a thresholding mechanism to transform the query to the one available in the dataset.
The output query from the previous step is vectorized as per the features learned from the training phase. To find the closest match with the one available in the dataset, we calculated the Euclidean distance to find the minimum value between them. Based on the minimum value, we extract the closed query statement with the posted one. Then, the obtained query is displayed to the customer for his/her perusal.
If, at the end of the previous step, the distance obtained is too large, the corresponding user query may not be available in the database. To handle this, we use a trained CNN model from the knowledge graph to identify the context and pass on the category to the user to enhance the query words.
At the end, if two cases as mentioned above fail to figure out the user query and relevant answers, the corresponding query will be forwarded to the concerned service engineer to resolve it. The resolution is later updated into the dataset for future correspondence.
We obtained 91% accuracy in answering the user queries using our chat application, and when tried on different sets of queries the average precision and recall scores obtained are 76.42% and 89.42% respectively. From the evaluation and manual verification, we found that our new chatbot started performing well to product-based user queries.
Experience at Ericsson
It’s been a great experience at Ericsson, Chennai, India. I joined as an Intern under the supervision of Dr. M. Saravanan and Dr. Perepu Satheesh Kumar. My internship at Ericsson has taught me more than I could have imagined; from personal skill development through interaction with people who are adept in so many things to the introduction to machine learning, deep learning and natural language processing. My internship has definitely given me a better understanding of my skill set. The experience has prepared me for a number of things such as office etiquette, working with deadlines and learning from hands-on training.